WO2021114977A1 - Method and device for positioning fundamental cause of abnormal event - Google Patents

Method and device for positioning fundamental cause of abnormal event Download PDF

Info

Publication number
WO2021114977A1
WO2021114977A1 PCT/CN2020/127110 CN2020127110W WO2021114977A1 WO 2021114977 A1 WO2021114977 A1 WO 2021114977A1 CN 2020127110 W CN2020127110 W CN 2020127110W WO 2021114977 A1 WO2021114977 A1 WO 2021114977A1
Authority
WO
WIPO (PCT)
Prior art keywords
abnormal
current
historical
fingerprint information
root cause
Prior art date
Application number
PCT/CN2020/127110
Other languages
French (fr)
Chinese (zh)
Inventor
卢冠男
朱红燕
莫林林
孙芮
薛文满
王雅琪
李冕正
张若君
Original Assignee
深圳前海微众银行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海微众银行股份有限公司 filed Critical 深圳前海微众银行股份有限公司
Publication of WO2021114977A1 publication Critical patent/WO2021114977A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • G06F11/3072Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application relates to the technical field of abnormal handling of Fintech, and in particular to a method and device for locating the root cause of an abnormal event.
  • computers can directly handle most of the business.
  • the entire life cycle of a product from design to release, operation and maintenance, change and upgrade, and to offline can be handled by computers, but various operations will also occur during its operation.
  • anomalies such as external partners, hosts, networks, business logic, and other processing nodes may be abnormal. Therefore, it is necessary to maintain the operation of the entire life cycle of the product, which includes the investigation of the cause of the abnormality; due to the occurrence of the abnormality It may not necessarily be performed at the processing node where the current exception occurs, but may be performed at other processing nodes; therefore, the staff needs to investigate the root cause of the exception, that is, the root cause.
  • the current root cause investigation method is to infer the root cause of the abnormality through one of the dimensions of alarms, logs, application version release, special SQL operations, promotion, and process changes.
  • the root cause of the abnormality is found after determining a certain dimension, the abnormal root cause is not in this dimension in some cases.
  • the success rate of a certain product business decreases, and a certain system that the product exchange passes through has a version release record, operation and maintenance
  • the personnel thus judged that in the application version release dimension, the success rate caused by this version has decreased; but the actual root cause is the non-compliant data transmitted from the external interface in another dimension; and the amount of information in each dimension is very large Therefore, even if the existing root cause location method only uses one dimension to investigate, it still requires a lot of work; for example, the root cause location in intelligent operation and maintenance is mostly cut from a certain dimension, infer the abnormality, and determine the cut-in dimension as an alarm Dimension, it is necessary to remove invalid alarm information in the alarm dimension (invalid alarm information can be normal alarms of equipment, edge value alarms, etc., which does not provide root cause positioning assistance), because the system in intelligent operation and maintenance may include multiple Each subsystem generates the same alarm information, so it is necessary to further converge the same alarm information to obtain different alarm information, and locate the root cause of the abnormality through the different alarm information obtained.
  • the embodiment of the present invention provides a method and device for locating the root cause of an abnormal event, which can reduce the workload of locating an abnormal root cause, shorten the cycle of locating an abnormal root cause, and improve the root cause of an abnormal event under the condition of multi-dimensional analysis and judgment. The efficiency of positioning.
  • an embodiment of the present invention provides a method for locating the root cause of an abnormal event, and the method includes:
  • each preset dimension corresponding to the current abnormal event; determine the current fingerprint information of the current abnormal event according to the current value of each preset dimension; wherein each preset dimension corresponds to a fingerprint; The fingerprint information and each historical fingerprint information are similarly calculated; each historical fingerprint information is obtained according to the corresponding historical abnormal event, and the historical abnormal event corresponds to the abnormal root cause; the historical fingerprint information whose similarity meets the set threshold is corresponding The abnormal root cause of is determined as the abnormal root cause of the current abnormal event.
  • the fingerprint information of the current abnormal event can be collected in multiple dimensions ;
  • similar historical fingerprint information is obtained, so as to obtain the abnormal root cause corresponding to the historical abnormal event, and the abnormal root cause of the historical abnormal event further obtains the current abnormal event Abnormal root cause:
  • this application can use the similarity of matching multi-dimensional current fingerprint information and multi-dimensional historical fingerprint information to obtain historical abnormal events similar to current abnormal events.
  • the abnormal root cause of the historical abnormal event is judged the abnormal root cause of the current abnormal event. Therefore, under the condition of multi-dimensional analysis and judgment, the workload of abnormal root cause locating can be reduced, the cycle of abnormal root cause locating can be shortened, and the abnormality can be improved. The efficiency of event root cause positioning.
  • calculating the similarity between the current fingerprint information and each historical fingerprint information includes: determining the current fingerprint information according to the current value of each fingerprint and the weight of each fingerprint in the current fingerprint information The current vector of fingerprint information; according to the current vector and each historical vector corresponding to each historical fingerprint information, the similarity between the current fingerprint information and each historical fingerprint information is calculated.
  • the current value of each fingerprint in the current fingerprint information is obtained, so that the current vector contains the fingerprint corresponding to each current value in the current abnormal event, and the weight is set for each fingerprint, so that the current vector not only contains the current value.
  • the multi-dimensional fingerprint information of the abnormal event also reasonably allocates the importance of each fingerprint in the current abnormal event, which makes the similarity calculated by the current vector and each historical vector more accurate, and further increases the root cause of the abnormality. Accuracy of positioning.
  • calculating the similarity between the current fingerprint information and each historical fingerprint information according to the current vector and each historical vector corresponding to the historical fingerprint information includes:
  • A is the current vector
  • B is the historical vector
  • the calculated similarity can be more accurate, so that the abnormal root cause of the determined similar historical abnormal event is more similar to the abnormal root cause of the current historical abnormal event. Increased the accuracy of the location of the abnormal root cause of the current abnormal event.
  • determining the abnormal root cause corresponding to the historical fingerprint information whose similarity meets a set threshold as the abnormal root cause of the current abnormal event includes: screening the marked abnormal root causes in the abnormal root cause Cause; Taking the similarity of the historical fingerprint information corresponding to the marked abnormal root cause as the first reference, and the number of occurrences of the marked abnormal root cause as the second reference, determine the recommended marked abnormal root cause from each marked abnormal root cause; The recommended marking abnormal root cause determines the abnormal root cause of the current abnormal event.
  • the method further includes:
  • each preset dimension corresponding to the current abnormal event update the current fingerprint information and store the current fingerprint information as historical fingerprint information.
  • the amount of information in the historical database can be increased to help follow-up similar abnormal events to pass the current value and abnormality of the current abnormal event. Root cause accurately and quickly locate the abnormal root cause of subsequent similar abnormal events.
  • the method further includes:
  • the fingerprint node records the historical value of the historical abnormal event in the preset dimension corresponding to the fingerprint and the event identifier
  • the phenomenon-type fingerprint nodes in the event node and the fingerprint node are associated and stored through a first edge; the first edge is used to indicate that there is a phenomenon relationship in a preset dimension between the fingerprint node and the event node;
  • the event node and the root cause fingerprint node in the fingerprint node are associated and stored through a second edge; the second edge is used to indicate that there is a root cause relationship between the secondary fingerprint node and the event node;
  • the event node and the root cause node are associated and stored through the second edge.
  • each fingerprint in the historical fingerprint information is used as the fingerprint node, and the abnormal root cause in the historical fingerprint information is used as the root cause node, and the event identification and corresponding information are stored in each node.
  • the information is associated and stored through the first side or the second side; this method facilitates the update and modification of the database storing historical abnormal events, and also enables the historical fingerprint information of historical abnormal events to be displayed more intuitively and easy to find.
  • an embodiment of the present invention provides a device for locating the root cause of an abnormal event, the device including:
  • the determining unit is used to determine the current value of each preset dimension corresponding to the current abnormal event; determine the current fingerprint information of the current abnormal event according to the current value of each preset dimension; wherein, each preset dimension corresponds to a fingerprint ;
  • the calculation unit is configured to calculate the similarity between the current fingerprint information and each historical fingerprint information; each historical fingerprint information is obtained according to a corresponding historical abnormal event, and the historical abnormal event corresponds to an abnormal root cause;
  • the determining unit is further configured to determine the abnormal root cause corresponding to the historical fingerprint information whose similarity meets a set threshold as the abnormal root cause of the current abnormal event.
  • the calculation unit is specifically used for:
  • an embodiment of the present application further provides a computing device, including: a memory, configured to store program instructions; a processor, configured to call the program instructions stored in the memory, and execute according to the obtained program as in the first aspect
  • a computing device including: a memory, configured to store program instructions; a processor, configured to call the program instructions stored in the memory, and execute according to the obtained program as in the first aspect
  • embodiments of the present application also provide a computer-readable non-volatile storage medium, including computer-readable instructions.
  • the computer reads and executes the computer-readable instructions, the computer executes the same as in the first aspect.
  • FIG. 1 is a schematic structural diagram of a system for locating the root cause of an abnormal event according to an embodiment of the present invention
  • FIG. 2 is a schematic flowchart of a method for locating the root cause of an abnormal event according to an embodiment of the present invention
  • FIG. 3 is a schematic structural diagram of a method for storing historical abnormal events according to an embodiment of the present invention.
  • 4a is a schematic structural diagram of a current storage method for abnormal events according to an embodiment of the present invention.
  • 4b is a schematic structural diagram of an abnormal event storage method provided by an embodiment of the present invention.
  • FIG. 5 is a schematic flowchart of a method for locating the root cause of an abnormal event according to an embodiment of the present invention
  • FIG. 6 is a schematic diagram of a device for locating the root cause of an abnormal event provided by an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of another device for locating the root cause of an abnormal event provided by an embodiment of the present invention.
  • the historical fingerprint information formed by historical abnormal events in the embodiment of the present application can be processed not only according to the multi-dimensional historical abnormal events, but also can be quickly located by means of fingerprint comparison.
  • the historical abnormal events can be collected first, and the dimensional information of the historical abnormal events can be determined from each historical abnormal event.
  • Each dimensional information can include the configuration information of the device or the environment when the abnormality occurs ( (Such as product types, product application scenarios, etc.), it can also include abnormal indicators when an abnormality occurs (such as transaction volume, transaction delay, etc.), and can also include root cause source information (such as alarm dimensions, interface) that can be derived from abnormal events Dimensions, log dimensions, application version release dimensions, etc.).
  • the preset dimensions of each historical abnormal event are obtained.
  • the preset dimensions of each historical abnormal event may not be exactly the same; at the same time, each historical abnormal event is also determined
  • the historical fingerprint information of the historical abnormal event and the abnormal root cause of each historical abnormal event can be obtained.
  • the abnormal event root cause location system architecture of Figure 1 is formed.
  • the monitoring module 101 can monitor the item value of one or more products in multiple scenarios. When the item value is abnormal, for example, it can be a trading item. The value of the transaction quantity exceeds the preset range, and the current abnormal event is generated.
  • the current value of each preset dimension of the current abnormal event is sent to the analysis module 102.
  • the analysis module 102 extracts the fingerprint of the current abnormal event in the current abnormal event information, and generates current fingerprint information, such as: product category, product application scenario, product abnormal dimension, abnormal item value and other information; through the historical abnormal event database 103 Each historical fingerprint information and current fingerprint information analyze the abnormal root cause of the current abnormal event.
  • the embodiment of the present application provides a method for locating the root cause of an abnormal event, as shown in FIG. 2, including:
  • Step 201 Determine the current value of each preset dimension corresponding to the current abnormal event
  • the current abnormal event is an abnormal event that occurs at the current moment, and the root cause of the abnormal event needs to be determined later.
  • a possible implementation manner is to determine each preset dimension of the current abnormal event according to the product and product application scenario corresponding to the current abnormal event, so as to obtain the current value of each preset dimension, and send it to the analysis module 102.
  • Another possible implementation is to determine a comprehensive preset dimension according to each preset dimension corresponding to each historical abnormal event; thereby obtaining the current value of each preset dimension of the current abnormal event and sending it to the analysis module 102 .
  • the relevant item values of the AA loan of this product in the scenario loan borrowing include the current transaction volume of 300,000, the current average delay of 0.5h, the system success rate of 90%, and the current success rate of 90%.
  • the preset dimensions are log dimensions, alarm dimensions, application version release dimensions, current transaction volume, current success rate, product and product scenario to which the application version is released, and other information; therefore, the current value of each preset dimension can be obtained.
  • Step 202 Determine the current fingerprint information of the current abnormal event according to the current value of each preset dimension; wherein, each preset dimension corresponds to a fingerprint;
  • the current fingerprint information is the fingerprint information of the current abnormal event, which may be a set of fingerprints describing the current abnormal event, and the set may include the product information fingerprint, scene information fingerprint, and abnormal index fingerprint of the current abnormal event.
  • ⁇ 'root_imsInterface':['rootSystemEnName','rootMetricId'] that is: ⁇ (fingerprint dimension)'interface':['abnormal subsystem name','interfaceId'](value attribute)
  • the current transaction volume of the product AA loan in the scenario CC loan borrowing must not be less than 400,000, the current average delay must not exceed 0.7h, the system success rate must not be less than 99%, and the current success rate must not be less than 99%; but the current product AA loan has various preset dimensions of CC loan borrowing in the scene (can include alarm dimension, interface dimension, log dimension, application version release dimension, special SQL operation dimension, promotion dimension, process change dimension, etc.) ,
  • the abnormal item value is detected, among which the current transaction volume in the log is 300,000, the current success rate is 90%, an alarm is generated, and the system success rate is 90% after the application version is released.
  • the abnormal preset dimensions of the current abnormal event are the log dimension, the alarm dimension, and the application version release dimension.
  • the current abnormality is determined by the abnormal preset dimensions and the current transaction volume, current success rate, product and product scenario to which the application version is released, etc.
  • the current fingerprint information of the event can be: product ID: AA loan, scene ID: CC loan loan, log ID + current transaction volume, alarm ID + current success rate, system success rate, application version release Exist.
  • the value of the abnormal item can also be the peak value of a sudden increase and decrease, etc., which is not specifically limited.
  • Step 203 Perform similarity calculation between the current fingerprint information and each historical fingerprint information; each historical fingerprint information is obtained according to a corresponding historical abnormal event, and the historical abnormal event corresponds to an abnormal root cause;
  • the similarity can be determined by the same number of fingerprints in the current fingerprint information and the historical fingerprint information, or the similarity of each fingerprint can be calculated and then determined according to the similarity of each fingerprint Fingerprint similarity.
  • the embodiment of this application specifically provides a calculation of the similarity between current fingerprint information and each historical fingerprint information, including: determining the current fingerprint information according to the current value of each fingerprint in the current fingerprint information and the weight of each fingerprint Current vector; according to the current vector and each historical vector corresponding to each historical fingerprint information, through the formula: Calculate the similarity between the current fingerprint information and each historical fingerprint information, where A is the current vector and B is the historical vector.
  • Step 204 Determine the abnormal root cause corresponding to the historical fingerprint information whose similarity meets the set threshold as the abnormal root cause of the current abnormal event.
  • a similarity threshold can be set. If it is greater than the set threshold, the historical abnormal event can be used as a similar historical abnormal event to extract the abnormal root cause, and the abnormal root cause of the historical abnormal event can be determined according to the abnormal root cause of the current abnormal event.
  • the root cause of the abnormality for example, suppose the current abnormal event is A, the historical abnormal events are B 1 , B 2 , B 3 , and the similarity between the historical abnormal events B 1 , B 2 , B 3 and the current abnormal event A is 80%, 42%, 99% If the similarity threshold is set to 50%, the historical abnormal events B 1 and B 3 are similar abnormal events of the current abnormal event A.
  • the abnormal root cause of the similar historical abnormal event is obtained, and the marked abnormal root cause can be screened among the abnormal root causes; the historical fingerprint information corresponding to the abnormal root cause can be marked
  • the similarity of is the first reference, and the number of occurrences of the marked abnormal root cause is the second reference, and the recommended marked abnormal root cause is determined from each marked abnormal root cause; the current abnormal event is determined according to the recommended marked abnormal root cause The root cause of the abnormality.
  • the marked abnormal root cause can be the abnormal root cause that has been manually marked in the abnormal root cause of the historical abnormal event.
  • the artificially marked abnormal root cause will be recorded because of the important abnormal root cause of the historical abnormal event.
  • a detailed description of the cause for example, a historical abnormal event with a similarity of 100% B 1 contains the marked abnormal root causes a, b, and c, and a historical abnormal event with a similarity of 89% B 2 contains the marked abnormal root by a, e, 72% similarity historical abnormal event mark
  • B 3 comprises a root cause abnormal f, according to the first reference and the second reference is similar to the first and more often recommend high abnormal result of the root a, followed by the abnormal root cause e, and finally the abnormal root cause f.
  • the recommendation order can be random, or the recommendation order can be determined based on the weight of the abnormal root cause and other factors, which is not specifically limited.
  • the recommended highest similarity abnormal root cause abnormal event history e.g., in one example, only recommended abnormality flag B 1 contains the root due to a, b, c, three markers can be a root cause abnormal Random recommendation, the recommendation order can also be determined according to the weight value, and the specific method for recommending the abnormal root cause of similar historical abnormal events is not limited.
  • the current fingerprint information may be updated and stored as historical fingerprint information according to the current value of each preset dimension corresponding to the current abnormal event and the abnormal root cause of the current abnormal event. That is to say, after determining the abnormal root cause of the current abnormal event, the current fingerprint information of the current abnormal event can be updated to the historical fingerprint information containing the abnormal root cause fingerprint information, and the current abnormal event can be updated to include the abnormal root cause.
  • Historical abnormal events, corresponding to the historical abnormal events and historical fingerprint information are stored in the historical abnormal event database.
  • the fingerprint information of the current abnormal event can be collected in multiple dimensions ;
  • similar historical fingerprint information is obtained, so as to obtain the abnormal root cause corresponding to the historical abnormal event, and the abnormal root cause of the historical abnormal event further obtains the current abnormal event Abnormal root cause:
  • this application can use the similarity of matching multi-dimensional current fingerprint information and multi-dimensional historical fingerprint information to obtain historical abnormal events similar to current abnormal events.
  • the abnormal root cause of historical abnormal events is judged on the abnormal root cause of the current abnormal event. Therefore, under the conditions of multi-dimensional analysis and judgment, the workload of abnormal root cause locating can be reduced, and the cycle of abnormal root cause locating can be shortened.
  • the embodiment of the present application also provides a method for storing historical abnormal events through a knowledge graph.
  • the historical abnormal events are used as event nodes, and the event identifiers are recorded in the event nodes; each fingerprint in the historical fingerprint information is used as a fingerprint Node, the fingerprint node records the historical value of the historical abnormal event in the preset dimension corresponding to the fingerprint and the event identifier; taking the abnormal root cause in the historical fingerprint information as the root cause node, the The root cause node records the abnormal root cause corresponding to the historical abnormal event and the event identifier; the event node and the fingerprint node are associated and stored through a first edge; the event node and the root cause node The second edge is associated and stored; wherein, the first edge is used to indicate that the fingerprint node is the preset dimension of the event node; the second edge is used to indicate that the root cause node is the root cause of the event node and the root cause.
  • it includes:
  • This node contains the event information of the historical abnormal event, historical fingerprint information and the identifier of the historical abnormal event.
  • the event node passes through the first edge is associated with the phenomenon fingerprint node in the fingerprint node, and the phenomenon fingerprint node stores the abnormal index (historical value) information and related information of the dimension corresponding to the fingerprint of the historical abnormal event in the fingerprint, such as the current average delay and Generate the product information and scene information corresponding to the current average delay; and the historical abnormal event identifier, for example, the historical abnormal event identifier can be product + time and other information;
  • the event node passes through the second edge (has_anomaly_factor) Associate the root cause node with the root cause type fingerprint node in the fingerprint node.
  • the root cause node stores the abnormal root cause of the historical abnormal event, such as the root_pr application version release operation, PMBAN (customized subsystem name) parameters Changes, etc.; the root cause type fingerprint node in the fingerprint node is the information source from which the root cause of the abnormality may be analyzed in the historical abnormal event.
  • Each root cause node and root cause type fingerprint node in the fingerprint node includes the event identifier of the historical abnormal event; the first edge can store indicator related information, the start time and end time of the indicator abnormality, and the amount of indicator change, etc.
  • the event ID and root cause type can be saved in the two sides, and index information can be added to the side to facilitate subsequent search; the upper and lower sides of the dotted line in Figure 3 are respectively a historical abnormal event, and the lower historical abnormal event is marked with an abnormality
  • the root cause stores the analysis or description of the root cause by engineers and technicians.
  • an embodiment of the present application provides a flow of a method for locating the root cause of an abnormal event, as shown in FIG. 5, including:
  • Step 501 Detect the value of the abnormal item
  • the item value is monitored, and the abnormal item value is detected.
  • Step 502 Trigger the formation of the current abnormal event
  • the occurrence of the abnormal item value triggers the formation of the current abnormal event.
  • Step 503 Generate nodes and associate edges
  • the product information, scene information, and abnormal item information in the preset dimensions contained in the current abnormal event are respectively stored in the neo4j node, and the node is connected through belongto to indicate attribution information; the neo4j graph can be displayed on the computer As shown in Figure 4a, the current average delay, current success rate, system success rate, and current transaction volume belong to scenarios, and different scenarios belong to the same sub-product.
  • Step 504 Match similar historical abnormal events
  • the similarity between the current abnormal event and each historical abnormal event in the historical abnormal database is obtained by the following similarity calculation formula, and the historical abnormal event corresponding to the similarity greater than the set threshold is the similar historical abnormal event.
  • the calculation of the similarity between the current fingerprint information and each historical fingerprint information may include determining the current vector of the current fingerprint information according to the current value of each fingerprint in the current fingerprint information and the weight of each fingerprint; Each historical vector corresponding to each historical fingerprint information, through the formula: Calculate the similarity between the current fingerprint information and each historical fingerprint information, where A is the current vector and B is the historical vector.
  • the above-mentioned fingerprints (dimension variables) in the current abnormal event and the historical abnormal event can be vectorized separately, one-hot encoding is used, and the weight is multiplied to obtain: current vector A and historical vector B;
  • the similarity between the current fingerprint information and each historical fingerprint information can also be calculated by text matching. For example, there are 6 fingerprints in the current fingerprint information of the current abnormal event A, and the historical fingerprint of the historical abnormal event B Is the similarity, the similarity is 83.33%.
  • Step 505 Obtain abnormal root causes of similar historical abnormal events
  • the abnormal root cause of the similar historical abnormal event is obtained according to the similar historical abnormal event, which can be displayed in the computer as the fingerprint node (current average time, current success rate) in Figure 4a, Extend the historical abnormal events as shown in Figure 4b to obtain the abnormal root causes of the extended historical abnormal events.
  • the abnormal root causes obtained are recommended in order of similarity from the highest to the bottom and the number of occurrences from the highest to the bottom.
  • Step 506 Investigate the abnormal root cause of the current abnormal event
  • the abnormal root causes can be investigated in the recommended order until the abnormal root cause of the current abnormal event is found.
  • Step 507 Mark the abnormal root cause of the current abnormal event
  • engineers and technicians can analyze the abnormal root cause of the current abnormal event that is currently found, mark the important root cause of the abnormal phenomenon, and record the attribute, analysis result, improvement result, etc. of the important root cause.
  • Step 508 Update and store the abnormal event and the abnormal root cause of the abnormal event
  • the current abnormal event description information and the abnormal root cause determined by the engineering and technical personnel and the marked abnormal root cause are stored in the historical abnormal event database to facilitate subsequent root cause location of the same or similar abnormal events.
  • FIG. 6 is a schematic diagram of the device for locating the root cause of an abnormal event provided by an embodiment of the application, as shown in FIG. 6, including:
  • the determining unit 601 is configured to determine the current value of each preset dimension corresponding to the current abnormal event; determine the current fingerprint information of the current abnormal event according to the current value of each preset dimension; wherein, each preset dimension corresponds to one fingerprint;
  • the calculation unit 602 is configured to calculate the similarity between the current fingerprint information and each historical fingerprint information; each historical fingerprint information is obtained according to a corresponding historical abnormal event, and the historical abnormal event corresponds to an abnormal root cause;
  • the determining unit 601 is further configured to determine the abnormal root cause corresponding to the historical fingerprint information whose similarity meets a set threshold value as the abnormal root cause of the current abnormal event.
  • the calculation unit 602 is specifically configured to: determine the current vector of the current fingerprint information according to the current value of each fingerprint and the weight of each fingerprint in the current fingerprint information; The current vector and each historical vector corresponding to each historical fingerprint information are calculated to calculate the similarity between the current fingerprint information and each historical fingerprint information.
  • calculating the similarity between the current fingerprint information and each historical fingerprint information according to each historical vector corresponding to the current vector and each historical fingerprint information includes:
  • A is the current vector
  • B is the historical vector
  • the determining unit 601 is specifically configured to screen the marked abnormal root causes among the abnormal root causes; take the similarity of the historical fingerprint information corresponding to the marked abnormal root cause as the first reference, and use the mark
  • the number of occurrences of the abnormal root cause is the second criterion, and the recommended root cause of the marked abnormality is determined from each marked abnormal root cause; the abnormal root cause of the current abnormal event is determined according to the recommended marked abnormal root cause.
  • FIG. 7 is a schematic diagram of another abnormal event root cause locating device provided by an embodiment of the application, as shown in FIG. 7, including:
  • the determining unit 701 is configured to determine the current value of each preset dimension corresponding to the current abnormal event; determine the current fingerprint information of the current abnormal event according to the current value of each preset dimension; wherein, each preset dimension corresponds to one fingerprint;
  • the calculation unit 702 is configured to calculate the similarity between the current fingerprint information and each historical fingerprint information; each historical fingerprint information is obtained according to a corresponding historical abnormal event, and the historical abnormal event corresponds to an abnormal root cause;
  • the determining unit 701 is further configured to determine the abnormal root cause corresponding to the historical fingerprint information whose similarity meets a set threshold as the abnormal root cause of the current abnormal event.
  • the calculation unit 702 is specifically configured to: determine the current vector of the current fingerprint information according to the current value of each fingerprint and the weight of each fingerprint in the current fingerprint information; The current vector and each historical vector corresponding to each historical fingerprint information are calculated to calculate the similarity between the current fingerprint information and each historical fingerprint information.
  • calculating the similarity between the current fingerprint information and each historical fingerprint information according to each historical vector corresponding to the current vector and each historical fingerprint information includes:
  • A is the current vector
  • B is the historical vector
  • the determining unit 701 is specifically configured to screen the marked abnormal root causes among the abnormal root causes; use the similarity of the historical fingerprint information corresponding to the marked abnormal root cause as the first reference, and use the mark
  • the number of occurrences of the abnormal root cause is the second criterion, and the recommended root cause of the marked abnormality is determined from each marked abnormal root cause; the abnormal root cause of the current abnormal event is determined according to the recommended marked abnormal root cause.
  • the updating unit 703 is configured to update the current fingerprint information according to the current value of each preset dimension corresponding to the current abnormal event and store the current fingerprint information as historical fingerprint information.
  • the storage unit 704 stores the historical fingerprint information and the abnormal root cause corresponding to the historical fingerprint information in the following manner: the historical abnormal event is taken as the event node, and the event identifier is recorded in the event node; Each fingerprint in the historical fingerprint information serves as a fingerprint node, and the fingerprint node records the historical value of the historical abnormal event in the preset dimension corresponding to the fingerprint and the event identifier; the historical fingerprint information corresponds to The abnormal root cause serves as the root cause node, and the abnormal root cause corresponding to the historical abnormal event and the event identifier are recorded in the root cause node; the event node and the phenomenon fingerprint node in the fingerprint node are passed through the first One side is associated and stored; the first side is used to indicate that there is a phenomenon relationship in a preset dimension between the fingerprint node and the event node; the root factor fingerprint node in the event node and the fingerprint node is passed through the second side Associated storage; the second edge is used to indicate that there is a root cause relationship between the secondary fingerprint
  • this application can be provided as methods, systems, or computer program products. Therefore, this application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, this application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • General Factory Administration (AREA)
  • Alarm Systems (AREA)

Abstract

The embodiment of the present invention provides a method and a device for positioning a funcamental cause of an abnormal event. Wherein the method comprises: determining a current value of each preset dimension corresponding to a current abnormal event; determining current fingerprint information of the current abnormal event according to the current value of each preset dimension; wherein, each preset dimension corresponds to one fingerprint; performing similarity calculation between the current fingerprint information and respective historical fingerprint information; the historical fingerprint information is obtained according to a corresponding historical abnormal event, and the historical abnormal event corresponds to an abnormal fundamental cause; and determining the abnormal fundamental cause corresponding to the historical fingerprint information with the similarity meeting a set threshold as the abnormal fundamental cause of the current abnormal event. Compared with the method of investigating the abnormal fundamental cause of an event in one dimension in the prior art, the present application can reduce the workload of positioning the abnormal funcamental cause and shorten the period of positoning the abnormal funcamental cause under the condition of multi-dimensional analysis and judgment.

Description

一种异常事件根因定位方法及装置Method and device for locating root cause of abnormal event
相关申请的交叉引用Cross-references to related applications
本申请要求在2019年12月12日提交中国专利局、申请号为201911276509.8、申请名称为“一种异常事件根因定位方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on December 12, 2019, the application number is 201911276509.8, and the application title is "a method and device for locating the root cause of abnormal events", the entire content of which is incorporated by reference In this application.
技术领域Technical field
本申请涉及金融科技(Fintech)的异常处理技术领域,尤其涉及一种异常事件根因定位方法及装置。This application relates to the technical field of abnormal handling of Fintech, and in particular to a method and device for locating the root cause of an abnormal event.
背景技术Background technique
随着计算机技术的发展,越来越多的技术应用在金融领域,传统金融业正在逐步向金融科技(Fintech)转变,但由于金融行业的安全性、实时性要求,也对技术提出更高的要求。在网络迅速发展的今天,已经实现可以通过计算机直接处理大部分金融业务,这种方式极大地节省了人力资源,又可以快速且准确的处理金融业务,提高了金融业务处理的精确性和实时性。With the development of computer technology, more and more technologies are applied in the financial field. The traditional financial industry is gradually transforming to Fintech. However, due to the security and real-time requirements of the financial industry, higher technology is also required. Claim. With the rapid development of the Internet today, it has been realized that most financial services can be directly processed through computers. This method greatly saves human resources, and can quickly and accurately process financial services, improving the accuracy and real-time performance of financial services processing. .
当前,计算机可以直接处理大部分业务,比如一个产品从设计到发布、运行维护、变更升级及至下线的整个生命周期中的业务都可以靠计算机来处理,但其运行中也会出现各种各样的异常,如,外部合作伙伴、主机、网络、业务逻辑等处理节点可能会发生异常,因此需要对产品的整个生命周期的运行进行维护,其中就包含异常发生的原因调查;由于异常的发生不一定会在当前异常发生的处理节点表现,有可能会在其他处理节点表现;因此,工作人员需要调查该异常发生的根本原因,也就是根因。目前的根因调查方式为可以通过告警、日志、应用版本发布、特殊SQL操作、推广、流程变更等维度中的某一个维度去推断异常发生的根因。但由于确定某一维度后查找异常根因,异常根因有些情况下并不会在该维度下,例如,某产品业务成功率下降,该产品交易所经过的某***有版本发布记录,运维人员从而判断是应用版本发布维度中,该版本导致的成功率下降;但实际根因是另外维度中的对外接口传送来的不合规数据导致的;且每个维度内的信息量都很大,因此,即使现有根因定位方法只通过一个维度调查,仍然需要很大的工作量;例如:智能运维中的根因定位多为从某一维度切入,推断异常,确定切入维度为告警维度,则需要将该告警维度中的无效告警信息(无效告警信息可以是设备的常规告警、边缘值告警等,即并不能给予根因定位帮助)去除,由于智能运维中的***可能包括多个子***,会产生相同的告警信息,因此需要进一步将相同的告警信息收敛得到不同的告警信息,通过所得的不同的告警信息对 异常进行根因定位,但由于***的复杂性,最后得到的告警信息的数据依然非常大,因此使得现有技术中异常事件根因分析不仅过于片面,且工作量大,耗费周期长。At present, computers can directly handle most of the business. For example, the entire life cycle of a product from design to release, operation and maintenance, change and upgrade, and to offline can be handled by computers, but various operations will also occur during its operation. Such anomalies, such as external partners, hosts, networks, business logic, and other processing nodes may be abnormal. Therefore, it is necessary to maintain the operation of the entire life cycle of the product, which includes the investigation of the cause of the abnormality; due to the occurrence of the abnormality It may not necessarily be performed at the processing node where the current exception occurs, but may be performed at other processing nodes; therefore, the staff needs to investigate the root cause of the exception, that is, the root cause. The current root cause investigation method is to infer the root cause of the abnormality through one of the dimensions of alarms, logs, application version release, special SQL operations, promotion, and process changes. However, since the root cause of the abnormality is found after determining a certain dimension, the abnormal root cause is not in this dimension in some cases. For example, the success rate of a certain product business decreases, and a certain system that the product exchange passes through has a version release record, operation and maintenance The personnel thus judged that in the application version release dimension, the success rate caused by this version has decreased; but the actual root cause is the non-compliant data transmitted from the external interface in another dimension; and the amount of information in each dimension is very large Therefore, even if the existing root cause location method only uses one dimension to investigate, it still requires a lot of work; for example, the root cause location in intelligent operation and maintenance is mostly cut from a certain dimension, infer the abnormality, and determine the cut-in dimension as an alarm Dimension, it is necessary to remove invalid alarm information in the alarm dimension (invalid alarm information can be normal alarms of equipment, edge value alarms, etc., which does not provide root cause positioning assistance), because the system in intelligent operation and maintenance may include multiple Each subsystem generates the same alarm information, so it is necessary to further converge the same alarm information to obtain different alarm information, and locate the root cause of the abnormality through the different alarm information obtained. However, due to the complexity of the system, the final alarm is obtained The information data is still very large, so that the root cause analysis of abnormal events in the prior art is not only too one-sided, but also has a large workload and a long period of time.
因此,现在亟需一种异常事件根因定位方法及装置,能够在基于多维度的分析判断的条件下,减少异常根因定位的工作量,缩短异常根因定位的周期,提高异常事件根因定位的效率。Therefore, there is an urgent need for a method and device for locating the root cause of abnormal events, which can reduce the workload of locating abnormal root causes, shorten the cycle of abnormal root cause locating, and improve the root cause of abnormal events based on multi-dimensional analysis and judgment. The efficiency of positioning.
发明内容Summary of the invention
本发明实施例提供一种异常事件根因定位方法及装置,能够在基于多维度的分析判断的条件下,减少异常根因定位的工作量,缩短异常根因定位的周期,提高异常事件根因定位的效率。The embodiment of the present invention provides a method and device for locating the root cause of an abnormal event, which can reduce the workload of locating an abnormal root cause, shorten the cycle of locating an abnormal root cause, and improve the root cause of an abnormal event under the condition of multi-dimensional analysis and judgment. The efficiency of positioning.
第一方面,本发明实施例提供一种异常事件根因定位方法,该方法包括:In the first aspect, an embodiment of the present invention provides a method for locating the root cause of an abnormal event, and the method includes:
确定当前异常事件对应的各预设维度的当前值;根据所述各预设维度的当前值确定所述当前异常事件的当前指纹信息;其中,每个预设维度对应一个指纹;将所述当前指纹信息与各历史指纹信息进行相似度计算;所述各历史指纹信息是根据对应的历史异常事件得到,所述历史异常事件对应有异常根因;将相似度满足设定阈值的历史指纹信息对应的异常根因,确定为所述当前异常事件的异常根因。Determine the current value of each preset dimension corresponding to the current abnormal event; determine the current fingerprint information of the current abnormal event according to the current value of each preset dimension; wherein each preset dimension corresponds to a fingerprint; The fingerprint information and each historical fingerprint information are similarly calculated; each historical fingerprint information is obtained according to the corresponding historical abnormal event, and the historical abnormal event corresponds to the abnormal root cause; the historical fingerprint information whose similarity meets the set threshold is corresponding The abnormal root cause of is determined as the abnormal root cause of the current abnormal event.
采用上述方法,通过确定当前异常事件的预设维度的当前值,并由该当前异常事件的预设维度的当前值确定当前指纹信息,以此,可以多维度的收集当前异常事件发生的指纹信息;通过当前指纹信息和各历史指纹信息的相似度计算,得到相似的历史指纹信息,从而得到该历史异常事件发生所对应的异常根因,通过历史异常事件的异常根因进一步得到当前异常事件的异常根因;相比于现有技术中通过一维度调查事件异常根因的方法,本申请可以利用多维当前指纹信息和多维历史指纹信息匹配的相似度得到相似当前异常事件的历史异常事件,通过历史异常事件的异常根因判断当前异常事件的异常根因,因此,能够在基于多维度的分析判断的条件下,减少异常根因定位的工作量,缩短异常根因定位的周期,提高了异常事件根因定位的效率。Using the above method, by determining the current value of the preset dimension of the current abnormal event, and determining the current fingerprint information from the current value of the preset dimension of the current abnormal event, the fingerprint information of the current abnormal event can be collected in multiple dimensions ; Through the calculation of the similarity between the current fingerprint information and each historical fingerprint information, similar historical fingerprint information is obtained, so as to obtain the abnormal root cause corresponding to the historical abnormal event, and the abnormal root cause of the historical abnormal event further obtains the current abnormal event Abnormal root cause: Compared with the method of investigating the abnormal root cause of an event in one dimension in the prior art, this application can use the similarity of matching multi-dimensional current fingerprint information and multi-dimensional historical fingerprint information to obtain historical abnormal events similar to current abnormal events. The abnormal root cause of the historical abnormal event is judged the abnormal root cause of the current abnormal event. Therefore, under the condition of multi-dimensional analysis and judgment, the workload of abnormal root cause locating can be reduced, the cycle of abnormal root cause locating can be shortened, and the abnormality can be improved. The efficiency of event root cause positioning.
在一种可能的设计中,将所述当前指纹信息与各历史指纹信息进行相似度计算,包括:根据所述当前指纹信息中每个指纹的当前值和每个指纹的权重,确定所述当前指纹信息的当前向量;根据所述当前向量与所述各历史指纹信息对应的各历史向量,计算所述当前指纹信息与各历史指纹信息之间的相似度。In a possible design, calculating the similarity between the current fingerprint information and each historical fingerprint information includes: determining the current fingerprint information according to the current value of each fingerprint and the weight of each fingerprint in the current fingerprint information The current vector of fingerprint information; according to the current vector and each historical vector corresponding to each historical fingerprint information, the similarity between the current fingerprint information and each historical fingerprint information is calculated.
采用上述方法,通过获得当前指纹信息中每个指纹的当前值,使得当前向量中包含当前异常事件中的每个当前值对应的指纹,又通过为每个指纹设置权重,使得当前向量不仅包含当前异常事件的多个维度的指纹信息,还合理的分配每个指纹在该当前异常事件的重要性,进而使得通过当前向量与各历史向量计算得到的相似度更加准确,进一步加大了异常根因定位的准确性。Using the above method, the current value of each fingerprint in the current fingerprint information is obtained, so that the current vector contains the fingerprint corresponding to each current value in the current abnormal event, and the weight is set for each fingerprint, so that the current vector not only contains the current value. The multi-dimensional fingerprint information of the abnormal event also reasonably allocates the importance of each fingerprint in the current abnormal event, which makes the similarity calculated by the current vector and each historical vector more accurate, and further increases the root cause of the abnormality. Accuracy of positioning.
在一种可能的设计中,根据所述当前向量与所述各历史指纹信息对应的 各历史向量,计算所述当前指纹信息与各历史指纹信息之间的相似度,包括:In a possible design, calculating the similarity between the current fingerprint information and each historical fingerprint information according to the current vector and each historical vector corresponding to the historical fingerprint information includes:
Figure PCTCN2020127110-appb-000001
Figure PCTCN2020127110-appb-000001
其中,A为所述当前向量,B为所述历史向量。Wherein, A is the current vector, and B is the historical vector.
采用上述方法,将当前向量与历史向量代入公式(1)中,可以使计算得到的相似度更加准确,使得确定的相似历史异常事件的异常根因与当前历史异常事件的异常根因更加相似,加大了当前异常事件的异常根因定位的准确性。Using the above method, substituting the current vector and the historical vector into the formula (1), the calculated similarity can be more accurate, so that the abnormal root cause of the determined similar historical abnormal event is more similar to the abnormal root cause of the current historical abnormal event. Increased the accuracy of the location of the abnormal root cause of the current abnormal event.
在一种可能的设计中,将相似度满足设定阈值的历史指纹信息对应的异常根因,确定为所述当前异常事件的异常根因,包括:筛选所述异常根因中的标记异常根因;以标记异常根因对应的历史指纹信息的相似度为第一基准,以标记异常根因出现次数为第二基准,从各标记异常根因中确定推荐的标记异常根因;根据所述推荐的标记异常根因确定所述当前异常事件的异常根因。In a possible design, determining the abnormal root cause corresponding to the historical fingerprint information whose similarity meets a set threshold as the abnormal root cause of the current abnormal event includes: screening the marked abnormal root causes in the abnormal root cause Cause; Taking the similarity of the historical fingerprint information corresponding to the marked abnormal root cause as the first reference, and the number of occurrences of the marked abnormal root cause as the second reference, determine the recommended marked abnormal root cause from each marked abnormal root cause; The recommended marking abnormal root cause determines the abnormal root cause of the current abnormal event.
采用上述方法,通过筛选异常根因中的标记异常根因,可以得到相似历史异常事件发生的重要异常根因以及该重要异常根因的更多描述信息,加大当前异常事件根因定位的准确性;通过设置第一基准和第二基准,可以帮助工程技术人员准确快速的定位当前异常事件的异常根因。Using the above method, by screening the abnormal root causes in the marked abnormal root causes, it is possible to obtain the important abnormal root causes of similar historical abnormal events and more description information of the important abnormal root causes, and increase the accuracy of the current abnormal event root cause positioning By setting the first benchmark and the second benchmark, it can help engineers and technicians to accurately and quickly locate the abnormal root cause of the current abnormal event.
在一种可能的设计中,所述方法还包括:In a possible design, the method further includes:
根据所述当前异常事件对应的各预设维度的当前值,更新所述当前指纹信息并将所述当前指纹信息存储为历史指纹信息。According to the current value of each preset dimension corresponding to the current abnormal event, update the current fingerprint information and store the current fingerprint information as historical fingerprint information.
采用上述方法,通过将当前异常事件的当前值和异常根因更新历史数据库并存储,可以加大历史数据库的信息量,帮助后续相似异常事件发生时,能够通过该当前异常事件的当前值和异常根因准确快速定位后续相似异常事件发生的异常根因。Using the above method, by updating the historical database and storing the current value and abnormal root cause of the current abnormal event, the amount of information in the historical database can be increased to help follow-up similar abnormal events to pass the current value and abnormality of the current abnormal event. Root cause accurately and quickly locate the abnormal root cause of subsequent similar abnormal events.
在一种可能的设计中,所述方法还包括:In a possible design, the method further includes:
按照如下方式存储历史指纹信息和历史指纹信息对应的异常根因:Store the historical fingerprint information and the abnormal root cause corresponding to the historical fingerprint information as follows:
将历史异常事件作为事件节点,所述事件节点中记录有事件标识;Taking a historical abnormal event as an event node, and an event identifier is recorded in the event node;
将所述历史指纹信息对应的每个指纹作为指纹节点,所述指纹节点中记录有所述历史异常事件在所述指纹对应的预设维度的历史值和所述事件标识;Taking each fingerprint corresponding to the historical fingerprint information as a fingerprint node, and the fingerprint node records the historical value of the historical abnormal event in the preset dimension corresponding to the fingerprint and the event identifier;
将所述事件节点与所述指纹节点中的现象类指纹节点通过第一边关联存储;所述第一边用于指示指纹节点与事件节点之间存在预设维度中的现象关系;The phenomenon-type fingerprint nodes in the event node and the fingerprint node are associated and stored through a first edge; the first edge is used to indicate that there is a phenomenon relationship in a preset dimension between the fingerprint node and the event node;
将所述事件节点与所述指纹节点中的根因类指纹节点通过第二边关联存储;所述第二边用于指示从指纹节点与事件节点之间存在根因关系;The event node and the root cause fingerprint node in the fingerprint node are associated and stored through a second edge; the second edge is used to indicate that there is a root cause relationship between the secondary fingerprint node and the event node;
将所述事件节点与所述根因节点通过所述第二边关联存储。The event node and the root cause node are associated and stored through the second edge.
采用上述方法,通过将历史异常事件设置为事件节点,历史指纹信息中的每个指纹作为指纹节点,历史指纹信息中的异常根因作为根因节点,在每个节点中存入事件标识和相应的信息,并通过第一边关联或第二边关联并存储;该种方法便于存储历史异常事件的数据库的更新和修改,也使得历史异常事件的历史指纹信息可以更直观的展示,便于查找。Using the above method, by setting the historical abnormal event as the event node, each fingerprint in the historical fingerprint information is used as the fingerprint node, and the abnormal root cause in the historical fingerprint information is used as the root cause node, and the event identification and corresponding information are stored in each node. The information is associated and stored through the first side or the second side; this method facilitates the update and modification of the database storing historical abnormal events, and also enables the historical fingerprint information of historical abnormal events to be displayed more intuitively and easy to find.
第二方面,本发明实施例提供一种异常事件根因定位装置,该装置包括:In a second aspect, an embodiment of the present invention provides a device for locating the root cause of an abnormal event, the device including:
确定单元,用于确定当前异常事件对应的各预设维度的当前值;根据所述各预设维度的当前值确定所述当前异常事件的当前指纹信息;其中,每个预设维度对应一个指纹;The determining unit is used to determine the current value of each preset dimension corresponding to the current abnormal event; determine the current fingerprint information of the current abnormal event according to the current value of each preset dimension; wherein, each preset dimension corresponds to a fingerprint ;
计算单元,用于将所述当前指纹信息与各历史指纹信息进行相似度计算;所述各历史指纹信息是根据对应的历史异常事件得到,所述历史异常事件对应有异常根因;The calculation unit is configured to calculate the similarity between the current fingerprint information and each historical fingerprint information; each historical fingerprint information is obtained according to a corresponding historical abnormal event, and the historical abnormal event corresponds to an abnormal root cause;
所述确定单元还用于,将相似度满足设定阈值的历史指纹信息对应的异常根因,确定为所述当前异常事件的异常根因。The determining unit is further configured to determine the abnormal root cause corresponding to the historical fingerprint information whose similarity meets a set threshold as the abnormal root cause of the current abnormal event.
在一种可能的设计中,所述计算单元具体用于:In a possible design, the calculation unit is specifically used for:
根据所述当前指纹信息中每个指纹的当前值和每个指纹的权重,确定所述当前指纹信息的当前向量;Determine the current vector of the current fingerprint information according to the current value of each fingerprint and the weight of each fingerprint in the current fingerprint information;
根据所述当前向量与所述各历史指纹信息对应的各历史向量,计算所述当前指纹信息与各历史指纹信息之间的相似度。Calculate the similarity between the current fingerprint information and each historical fingerprint information according to each historical vector corresponding to the current vector and each historical fingerprint information.
第三方面,本申请实施例还提供一种计算设备,包括:存储器,用于存储程序指令;处理器,用于调用所述存储器中存储的程序指令,按照获得的程序执行如第一方面的各种可能的设计中所述的方法。In a third aspect, an embodiment of the present application further provides a computing device, including: a memory, configured to store program instructions; a processor, configured to call the program instructions stored in the memory, and execute according to the obtained program as in the first aspect The methods described in the various possible designs.
第四方面,本申请实施例还提供一种计算机可读非易失性存储介质,包括计算机可读指令,当计算机读取并执行所述计算机可读指令时,使得计算机执行如第一方面的各种可能的设计中所述的方法。In a fourth aspect, embodiments of the present application also provide a computer-readable non-volatile storage medium, including computer-readable instructions. When the computer reads and executes the computer-readable instructions, the computer executes the same as in the first aspect. The methods described in the various possible designs.
本申请的这些实现方式或其他实现方式在以下实施例的描述中会更加简明易懂。These implementation manners or other implementation manners of the present application will be more concise and understandable in the description of the following embodiments.
附图说明Description of the drawings
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简要介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to explain the technical solutions in the embodiments of the present invention more clearly, the following will briefly introduce the drawings needed in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, other drawings can be obtained from these drawings without creative labor.
图1为本发明实施例提供的一种异常事件根因定位***的架构示意图;FIG. 1 is a schematic structural diagram of a system for locating the root cause of an abnormal event according to an embodiment of the present invention;
图2为本发明实施例提供的一种异常事件根因定位方法的流程示意图;2 is a schematic flowchart of a method for locating the root cause of an abnormal event according to an embodiment of the present invention;
图3为本发明实施例提供的一种历史异常事件存储方法的结构示意图;3 is a schematic structural diagram of a method for storing historical abnormal events according to an embodiment of the present invention;
图4a为本发明实施例提供的一种当前异常事件存储方法的结构示意图;4a is a schematic structural diagram of a current storage method for abnormal events according to an embodiment of the present invention;
图4b为本发明实施例提供的一种异常事件存储方法的结构示意图;4b is a schematic structural diagram of an abnormal event storage method provided by an embodiment of the present invention;
图5为本发明实施例提供的一种异常事件根因定位方法的流程示意图;FIG. 5 is a schematic flowchart of a method for locating the root cause of an abnormal event according to an embodiment of the present invention;
图6为本发明实施例提供的一种异常事件根因定位的装置示意图;6 is a schematic diagram of a device for locating the root cause of an abnormal event provided by an embodiment of the present invention;
图7为本发明实施例提供的又一种异常事件根因定位的装置示意图。FIG. 7 is a schematic diagram of another device for locating the root cause of an abnormal event provided by an embodiment of the present invention.
具体实施方式Detailed ways
为了使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本 发明作进一步地详细描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本发明保护的范围。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all of them. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.
为了实现基于多维度且快速地根因定位,本申请实施例中通过历史异常事件形成的历史指纹信息进行处理,既能够依据历史异常事件的多维度,也可以通过指纹比对方式快速定位。对于历史指纹信息的收集,可以首先汇集各历史异常事件,从每个历史异常事件中确定历史异常事件发生时的各维度信息,其中,各维度信息可以包括异常发生时设备或环境的配置信息(如产品种类、产品应用场景等),也可以包括异常发生时的异常指标(如交易量、交易时延等),还可以包括可以推导出异常事件对应的根因来源信息(如告警维度、接口维度、日志维度、应用版本发布维度等)。针对每个历史异常事件的各维度信息进行分析后,得到每个历史异常事件的预设维度,其中,各历史异常事件的预设维度可以不完全相同;同时,也确定好每个历史异常事件的异常根因,从而根据每个历史异常事件的预设维度可以得到历史异常事件的历史指纹信息以及各历史异常事件的异常根因。如此,形成图1异常事件根因定位***架构,如图1所示,监控模块101可以监控一个或多个产品在多场景下的项目数值,当项目数值发生异常时,例如:可以是交易项目的交易数量值超出预设范围,生成当前异常事件。将当前异常事件的各预设维度的当前值发送至分析模块102。分析模块102将该当前异常事件信息的当前异常事件的指纹提取,并生成当前指纹信息,例如:产品种类、产品应用场景、产品异常维度和异常项目数值等信息;通过历史异常事件数据库103中的各历史指纹信息与当前指纹信息分析当前异常事件的异常根因。In order to realize multi-dimensional and rapid root cause location, the historical fingerprint information formed by historical abnormal events in the embodiment of the present application can be processed not only according to the multi-dimensional historical abnormal events, but also can be quickly located by means of fingerprint comparison. For the collection of historical fingerprint information, the historical abnormal events can be collected first, and the dimensional information of the historical abnormal events can be determined from each historical abnormal event. Each dimensional information can include the configuration information of the device or the environment when the abnormality occurs ( (Such as product types, product application scenarios, etc.), it can also include abnormal indicators when an abnormality occurs (such as transaction volume, transaction delay, etc.), and can also include root cause source information (such as alarm dimensions, interface) that can be derived from abnormal events Dimensions, log dimensions, application version release dimensions, etc.). After analyzing the various dimensions of each historical abnormal event, the preset dimensions of each historical abnormal event are obtained. Among them, the preset dimensions of each historical abnormal event may not be exactly the same; at the same time, each historical abnormal event is also determined According to the preset dimensions of each historical abnormal event, the historical fingerprint information of the historical abnormal event and the abnormal root cause of each historical abnormal event can be obtained. In this way, the abnormal event root cause location system architecture of Figure 1 is formed. As shown in Figure 1, the monitoring module 101 can monitor the item value of one or more products in multiple scenarios. When the item value is abnormal, for example, it can be a trading item. The value of the transaction quantity exceeds the preset range, and the current abnormal event is generated. The current value of each preset dimension of the current abnormal event is sent to the analysis module 102. The analysis module 102 extracts the fingerprint of the current abnormal event in the current abnormal event information, and generates current fingerprint information, such as: product category, product application scenario, product abnormal dimension, abnormal item value and other information; through the historical abnormal event database 103 Each historical fingerprint information and current fingerprint information analyze the abnormal root cause of the current abnormal event.
基于此,本申请实施例提供了一种异常事件根因定位的方法流程,如图2所示,包括:Based on this, the embodiment of the present application provides a method for locating the root cause of an abnormal event, as shown in FIG. 2, including:
步骤201、确定当前异常事件对应的各预设维度的当前值;Step 201: Determine the current value of each preset dimension corresponding to the current abnormal event;
此处,当前异常事件为当前时刻发生的异常事件,发生该异常事件的根因需要后续确定。一种可能的实现方式是可以通过当前异常事件所对应的产品和产品应用场景,确定当前异常事件的各预设维度,从而获得各预设维度的当前值,并发送至分析模块102。另一种可能的实现方式是,根据各历史异常事件对应的各预设维度,确定一个具有全集性的预设维度;从而获取当前异常事件的各预设维度的当前值并发送至分析模块102。例如,该产品AA贷款在场景***中的相关项目数值包括当前交易量为30万、当前平均时延0.5h、***成功率90%、当前成功率90%等等。预设维度为日志维度、告警维度、应用版本发布维度,当前交易量、当前成功率、应用版本发布所属的产品和产品场景等信息;因而可以获取各预设维度的当前值。Here, the current abnormal event is an abnormal event that occurs at the current moment, and the root cause of the abnormal event needs to be determined later. A possible implementation manner is to determine each preset dimension of the current abnormal event according to the product and product application scenario corresponding to the current abnormal event, so as to obtain the current value of each preset dimension, and send it to the analysis module 102. Another possible implementation is to determine a comprehensive preset dimension according to each preset dimension corresponding to each historical abnormal event; thereby obtaining the current value of each preset dimension of the current abnormal event and sending it to the analysis module 102 . For example, the relevant item values of the AA loan of this product in the scenario loan borrowing include the current transaction volume of 300,000, the current average delay of 0.5h, the system success rate of 90%, and the current success rate of 90%. The preset dimensions are log dimensions, alarm dimensions, application version release dimensions, current transaction volume, current success rate, product and product scenario to which the application version is released, and other information; therefore, the current value of each preset dimension can be obtained.
步骤202、根据所述各预设维度的当前值确定所述当前异常事件的当前指纹信息;其中,每个预设维度对应一个指纹;Step 202: Determine the current fingerprint information of the current abnormal event according to the current value of each preset dimension; wherein, each preset dimension corresponds to a fingerprint;
此处,当前指纹信息为当前异常事件的指纹信息,可以为描述该当前异常事件的指纹的集合,该集合可以包含该当前异常事件的产品信息指纹、场 景信息指纹、异常指标的指纹等。例如:{'root_imsInterface':['rootSystemEnName','rootMetricId'],即:{(指纹维度)'接口':['异常子***名称','接口Id'](取值属性),Here, the current fingerprint information is the fingerprint information of the current abnormal event, which may be a set of fingerprints describing the current abnormal event, and the set may include the product information fingerprint, scene information fingerprint, and abnormal index fingerprint of the current abnormal event. For example: {'root_imsInterface':['rootSystemEnName','rootMetricId'], that is: {(fingerprint dimension)'interface':['abnormal subsystem name','interfaceId'](value attribute),
'root_imsrcaLog':['subSystemName','interfaceId'],即:'(指纹维度)日志':['子***名称','日志Id'](取值属性),'root_imsrcaLog':['subSystemName','interfaceId'], that is:'(fingerprint dimension)log':['subsystem name','logId'](value attribute),
'root_sr':['systemName'],即:(指纹维度)'应用版本SQL操作':['***名称'](取值属性),'root_sr':['systemName'], that is: (fingerprint dimension)'application version SQL operation': ['system name'] (value attribute),
'root_pr':['systemName'],即:(指纹维度)'应用版本发布操作':['***名称'](取值属性),'root_pr':['systemName'], that is: (fingerprint dimension)'application version release operation': ['system name'] (value attribute),
'root_promotion':'exist',即:(指纹维度)'推广':'存在'(取值属性),'root_promotion':'exist', namely: (fingerprint dimension)'promotion':'existence' (value attribute),
'root_itsm':'exist',即:(指纹维度)'非应用版本变更':'存在'(取值属性),'root_itsm':'exist', that is: (fingerprint dimension)'non-application version change':'exist' (value attribute),
'root_imsAlert':['rootCauseType'],即:(指纹维度)'告警':['告警类别'](取值属性)'root_imsAlert':['rootCauseType'], that is: (fingerprint dimension)'alarm':['alarm category'] (value attribute)
‘time_period’:[day,night]}即:(指纹维度)‘时间段’:[白天,晚上](取值属性)}‘Time_period’:[day,night]} that is: (fingerprint dimension) ‘time period’: [day, night] (value attribute)}
如,在上一个示例中,产品AA贷款在场景CC***中当前交易量不得低于40万、当前平均时延不得超过0.7h、***成功率不得低于99%、当前成功率不得低于99%;但目前在产品AA贷款在场景CC***的各预设维度(可以包括告警维度、接口维度、日志维度、应用版本发布维度、特殊SQL操作维度、推广维度、流程变更维度等等)中,检测到异常项目数值,其中日志中当前交易量为30万,当前成功率90%产生告警,应用版本发布后***成功率为90%。因此当前异常事件的异常预设维度为日志维度、告警维度、应用版本发布维度,通过异常预设维度和当前交易量、当前成功率、应用版本发布所属的产品和产品场景等信息,确定当前异常事件的当前指纹信息可以为:产品ID:AA贷款,场景ID:CC***,日志ID+当前交易量,告警ID+当前成功率、***成功率,应用版本发布Exist。这里异常项目数值也可以是的突增突减的顶点值等,具体不做限定。For example, in the previous example, the current transaction volume of the product AA loan in the scenario CC loan borrowing must not be less than 400,000, the current average delay must not exceed 0.7h, the system success rate must not be less than 99%, and the current success rate must not be less than 99%; but the current product AA loan has various preset dimensions of CC loan borrowing in the scene (can include alarm dimension, interface dimension, log dimension, application version release dimension, special SQL operation dimension, promotion dimension, process change dimension, etc.) , The abnormal item value is detected, among which the current transaction volume in the log is 300,000, the current success rate is 90%, an alarm is generated, and the system success rate is 90% after the application version is released. Therefore, the abnormal preset dimensions of the current abnormal event are the log dimension, the alarm dimension, and the application version release dimension. The current abnormality is determined by the abnormal preset dimensions and the current transaction volume, current success rate, product and product scenario to which the application version is released, etc. The current fingerprint information of the event can be: product ID: AA loan, scene ID: CC loan loan, log ID + current transaction volume, alarm ID + current success rate, system success rate, application version release Exist. Here, the value of the abnormal item can also be the peak value of a sudden increase and decrease, etc., which is not specifically limited.
步骤203、将所述当前指纹信息与各历史指纹信息进行相似度计算;所述各历史指纹信息是根据对应的历史异常事件得到,所述历史异常事件对应有异常根因;Step 203: Perform similarity calculation between the current fingerprint information and each historical fingerprint information; each historical fingerprint information is obtained according to a corresponding historical abnormal event, and the historical abnormal event corresponds to an abnormal root cause;
对于指纹相似度的计算,由于指纹信息中包括多个指纹,可以通过当前指纹信息与历史指纹信息的指纹相同数量确定相似度,也可以计算每个指纹的相似度再根据各个指纹的相似度确定指纹相似度。本申请实施例具体提供一种当前指纹信息与各历史指纹信息的相似度计算,包括:根据所述当前指纹信息中每个指纹的当前值和每个指纹的权重,确定所述当前指纹信息的当前向量;根据所述当前向量与所述各历史指纹信息对应的各历史向量,通过 公式:
Figure PCTCN2020127110-appb-000002
计算所述当前指纹信息与各历史指纹信息之间的相似度,其中,A为当前向量,B为历史向量。相似度计算的方式有多种,具体不做限制。
For the calculation of fingerprint similarity, since the fingerprint information includes multiple fingerprints, the similarity can be determined by the same number of fingerprints in the current fingerprint information and the historical fingerprint information, or the similarity of each fingerprint can be calculated and then determined according to the similarity of each fingerprint Fingerprint similarity. The embodiment of this application specifically provides a calculation of the similarity between current fingerprint information and each historical fingerprint information, including: determining the current fingerprint information according to the current value of each fingerprint in the current fingerprint information and the weight of each fingerprint Current vector; according to the current vector and each historical vector corresponding to each historical fingerprint information, through the formula:
Figure PCTCN2020127110-appb-000002
Calculate the similarity between the current fingerprint information and each historical fingerprint information, where A is the current vector and B is the historical vector. There are many ways to calculate the similarity, and there are no specific restrictions.
步骤204、将相似度满足设定阈值的历史指纹信息对应的异常根因,确定为所述当前异常事件的异常根因。Step 204: Determine the abnormal root cause corresponding to the historical fingerprint information whose similarity meets the set threshold as the abnormal root cause of the current abnormal event.
此处,可以设定一个相似度阈值,若大于该设定阈值,则该历史异常事件可以作为相似历史异常事件提取异常根因,并根据该历史异常事件的异常根因确定该当前异常事件的异常根因;比如,设当前异常事件为A,历史异常事件有B 1、B 2、B 3,历史异常事件B 1、B 2、B 3与当前异常事件A的相似度分别为80%、42%、99%若设定相似度阈值为50%,则历史异常事件B 1、B 3为当前异常事件A的相似异常事件。 Here, a similarity threshold can be set. If it is greater than the set threshold, the historical abnormal event can be used as a similar historical abnormal event to extract the abnormal root cause, and the abnormal root cause of the historical abnormal event can be determined according to the abnormal root cause of the current abnormal event. The root cause of the abnormality; for example, suppose the current abnormal event is A, the historical abnormal events are B 1 , B 2 , B 3 , and the similarity between the historical abnormal events B 1 , B 2 , B 3 and the current abnormal event A is 80%, 42%, 99% If the similarity threshold is set to 50%, the historical abnormal events B 1 and B 3 are similar abnormal events of the current abnormal event A.
此处,在确定当前异常事件的相似历史异常事件后,即得到相似历史异常事件的异常根因,可以筛选所述异常根因中的标记异常根因;以标记异常根因对应的历史指纹信息的相似度为第一基准,以标记异常根因出现次数为第二基准,从各标记异常根因中确定推荐的标记异常根因;根据所述推荐的标记异常根因确定所述当前异常事件的异常根因。Here, after determining the similar historical abnormal event of the current abnormal event, the abnormal root cause of the similar historical abnormal event is obtained, and the marked abnormal root cause can be screened among the abnormal root causes; the historical fingerprint information corresponding to the abnormal root cause can be marked The similarity of is the first reference, and the number of occurrences of the marked abnormal root cause is the second reference, and the recommended marked abnormal root cause is determined from each marked abnormal root cause; the current abnormal event is determined according to the recommended marked abnormal root cause The root cause of the abnormality.
其中,标记异常根因可以为历史异常事件的异常根因中被人工标记过的异常根因,一般来说,人工标记过的异常根因为该历史异常事件的重要异常根因,会记录异常根因的相关详细描述;举个例子,有相似度为100%的历史异常事件B 1中包含标记异常根因a、b、c、相似度为89%的历史异常事件B 2中包含标记异常根因a、e,相似度为72%的历史异常事件B 3中包含标记异常根因f,则按照上述第一基准与第二基准则首先推荐相似度高且出现次数多的异常根因a、其次是相似度高、次数少的异常根因b、c,之后是异常根因e,最后是异常根因f。当异常根因所在的历史异常事件的相似度相同且出现次数相同时,推荐前后顺序可以随机,也可以根据异常根因的权重等因素决定推荐顺序,具体不做限定。这里也可以选择只推荐相似度最高的历史异常事件的异常根因,如,在上一个示例中,只推荐B 1中包含的标记异常根因a、b、c,三种标记异常根因可以随机推荐,也可以根据权重值决定推荐顺序,这里的相似历史异常事件得异常根因推荐方式具体不做限定。 Among them, the marked abnormal root cause can be the abnormal root cause that has been manually marked in the abnormal root cause of the historical abnormal event. Generally speaking, the artificially marked abnormal root cause will be recorded because of the important abnormal root cause of the historical abnormal event. A detailed description of the cause; for example, a historical abnormal event with a similarity of 100% B 1 contains the marked abnormal root causes a, b, and c, and a historical abnormal event with a similarity of 89% B 2 contains the marked abnormal root by a, e, 72% similarity historical abnormal event mark B 3 comprises a root cause abnormal f, according to the first reference and the second reference is similar to the first and more often recommend high abnormal result of the root a, Followed by the abnormal root causes b and c with high similarity and less frequency, followed by the abnormal root cause e, and finally the abnormal root cause f. When the historical abnormal events where the abnormal root cause is located have the same similarity and the same number of occurrences, the recommendation order can be random, or the recommendation order can be determined based on the weight of the abnormal root cause and other factors, which is not specifically limited. Here can also choose the recommended highest similarity abnormal root cause abnormal event history, e.g., in one example, only recommended abnormality flag B 1 contains the root due to a, b, c, three markers can be a root cause abnormal Random recommendation, the recommendation order can also be determined according to the weight value, and the specific method for recommending the abnormal root cause of similar historical abnormal events is not limited.
最后,可以根据所述当前异常事件对应的各预设维度的当前值及所述当前异常事件的异常根因,更新所述当前指纹信息并将所述当前指纹信息存储为历史指纹信息。也就是说,在确定当前异常事件的异常根因后,可以将该当前异常事件的当前指纹信息更新为包含异常根因指纹信息的历史指纹信息,将该当前异常事件更新为包含异常根因的历史异常事件,将该历史异常事件和历史指纹信息对应的存入历史异常事件数据库。Finally, the current fingerprint information may be updated and stored as historical fingerprint information according to the current value of each preset dimension corresponding to the current abnormal event and the abnormal root cause of the current abnormal event. That is to say, after determining the abnormal root cause of the current abnormal event, the current fingerprint information of the current abnormal event can be updated to the historical fingerprint information containing the abnormal root cause fingerprint information, and the current abnormal event can be updated to include the abnormal root cause. Historical abnormal events, corresponding to the historical abnormal events and historical fingerprint information are stored in the historical abnormal event database.
采用上述方法,通过确定当前异常事件的预设维度的当前值,并由该当前异常事件的预设维度的当前值确定当前指纹信息,以此,可以多维度的收集当前异常事件发生的指纹信息;通过当前指纹信息和各历史指纹信息的相 似度计算,得到相似的历史指纹信息,从而得到该历史异常事件发生所对应的异常根因,通过历史异常事件的异常根因进一步得到当前异常事件的异常根因;相比于现有技术中通过一维度调查事件异常根因的方法,本申请可以利用多维当前指纹信息和多维历史指纹信息匹配的相似度得到相似当前异常事件的历史异常事件,通过历史异常事件的异常根因判断当前异常事件的异常根因,因此,能够在基于多维度的分析判断的条件下,减少异常根因定位的工作量,缩短异常根因定位的周期。Using the above method, by determining the current value of the preset dimension of the current abnormal event, and determining the current fingerprint information from the current value of the preset dimension of the current abnormal event, the fingerprint information of the current abnormal event can be collected in multiple dimensions ; Through the calculation of the similarity between the current fingerprint information and each historical fingerprint information, similar historical fingerprint information is obtained, so as to obtain the abnormal root cause corresponding to the historical abnormal event, and the abnormal root cause of the historical abnormal event further obtains the current abnormal event Abnormal root cause: Compared with the method of investigating the abnormal root cause of an event in one dimension in the prior art, this application can use the similarity of matching multi-dimensional current fingerprint information and multi-dimensional historical fingerprint information to obtain historical abnormal events similar to current abnormal events. The abnormal root cause of historical abnormal events is judged on the abnormal root cause of the current abnormal event. Therefore, under the conditions of multi-dimensional analysis and judgment, the workload of abnormal root cause locating can be reduced, and the cycle of abnormal root cause locating can be shortened.
本申请实施例还提供了一种通过知识图谱存储历史异常事件的方法,将历史异常事件作为事件节点,所述事件节点中记录有事件标识;将所述历史指纹信息中的每个指纹作为指纹节点,所述指纹节点中记录有所述历史异常事件在所述指纹对应的预设维度的历史值和所述事件标识;将所述历史指纹信息中的异常根因作为根因节点,所述根因节点中记录有所述历史异常事件对应的异常根因和所述事件标识;将所述事件节点与所述指纹节点通过第一边关联存储;将所述事件节点与所述根因节点通过第二边关联存储;其中,所述第一边用于指示指纹节点为事件节点的预设维度;所述第二边用于指示根因节点为事件节点与的根因。如图3所示,包括:The embodiment of the present application also provides a method for storing historical abnormal events through a knowledge graph. The historical abnormal events are used as event nodes, and the event identifiers are recorded in the event nodes; each fingerprint in the historical fingerprint information is used as a fingerprint Node, the fingerprint node records the historical value of the historical abnormal event in the preset dimension corresponding to the fingerprint and the event identifier; taking the abnormal root cause in the historical fingerprint information as the root cause node, the The root cause node records the abnormal root cause corresponding to the historical abnormal event and the event identifier; the event node and the fingerprint node are associated and stored through a first edge; the event node and the root cause node The second edge is associated and stored; wherein, the first edge is used to indicate that the fingerprint node is the preset dimension of the event node; the second edge is used to indicate that the root cause node is the root cause of the event node and the root cause. As shown in Figure 3, it includes:
同时连接第一边(has_anomaly_metric)和第二边(has_anomaly_factor)的事件节点,该节点中包含该历史异常事件的事件信息、历史指纹信息和该历史异常事件的标识,该事件节点通过第一边(has_anomaly_metric)关联指纹节点中的现象类指纹节点,现象类指纹节点中存储该历史异常事件在该指纹对应的发生异常的维度的异常指标(历史值)信息和相关信息,如,当前平均时延和产生当前平均时延相对应的产品信息及场景信息等相关信息;以及该历史异常事件标识,如,该历史异常事件标识可以为产品+时间等信息组成的标识信息;该事件节点通过第二边(has_anomaly_factor)关联根因节点和指纹节点中的根因类指纹节点,根因节点中存储该历史异常事件的异常根因,如,root_pr应用版本发布操作、PMBAN(自定义的子***名称)参数变更等;指纹节点中的根因类指纹节点是历史异常事件中可能分析出异常根因的信息来源。每一个根因节点和指纹节点中的根因类指纹节点均包括该历史异常事件的事件标识;其中第一边中可以保存指标相关信息、指标异常开始时间和结束时间以及指标变化量等,第二边中可以保存事件ID和根因类型等,也可以在边中加入索引信息,便于后续查找;图3中虚线上侧和下侧分别为一个历史异常事件,下侧历史异常事件中标记异常根因中存储工程技术人员对该根因的分析或描述等信息。At the same time connect the event node of the first side (has_anomaly_metric) and the second side (has_anomaly_factor). This node contains the event information of the historical abnormal event, historical fingerprint information and the identifier of the historical abnormal event. The event node passes through the first edge ( has_anomaly_metric) is associated with the phenomenon fingerprint node in the fingerprint node, and the phenomenon fingerprint node stores the abnormal index (historical value) information and related information of the dimension corresponding to the fingerprint of the historical abnormal event in the fingerprint, such as the current average delay and Generate the product information and scene information corresponding to the current average delay; and the historical abnormal event identifier, for example, the historical abnormal event identifier can be product + time and other information; the event node passes through the second edge (has_anomaly_factor) Associate the root cause node with the root cause type fingerprint node in the fingerprint node. The root cause node stores the abnormal root cause of the historical abnormal event, such as the root_pr application version release operation, PMBAN (customized subsystem name) parameters Changes, etc.; the root cause type fingerprint node in the fingerprint node is the information source from which the root cause of the abnormality may be analyzed in the historical abnormal event. Each root cause node and root cause type fingerprint node in the fingerprint node includes the event identifier of the historical abnormal event; the first edge can store indicator related information, the start time and end time of the indicator abnormality, and the amount of indicator change, etc. The event ID and root cause type can be saved in the two sides, and index information can be added to the side to facilitate subsequent search; the upper and lower sides of the dotted line in Figure 3 are respectively a historical abnormal event, and the lower historical abnormal event is marked with an abnormality The root cause stores the analysis or description of the root cause by engineers and technicians.
这里需要说明的是,上述存储方式并不是唯一存储方式,可以通过表格等其他方式存储,具体不做限制。It should be noted here that the above storage method is not the only storage method, it can be stored in other methods such as tables, and there is no specific limitation.
基于上述历史异常事件存储方法,本申请实施例提供了一种异常事件根因定位方法的流程,如图5所示,包括:Based on the foregoing historical abnormal event storage method, an embodiment of the present application provides a flow of a method for locating the root cause of an abnormal event, as shown in FIG. 5, including:
步骤501、检测异常项目数值;Step 501: Detect the value of the abnormal item;
此处,监控项目数值,检测到异常项目数值。Here, the item value is monitored, and the abnormal item value is detected.
步骤502、触发形成当前异常事件;Step 502: Trigger the formation of the current abnormal event;
此处,异常项目数值出现后触发形成当前异常事件。Here, the occurrence of the abnormal item value triggers the formation of the current abnormal event.
步骤503、生成节点并关联边;Step 503: Generate nodes and associate edges;
此处,当前异常事件中包含的预设维度中的产品信息、场景信息、以及异常项目信息分别存入neo4j节点中,并通过belongto连接节点,表示归属信息;可在计算机中将该neo4j图谱展示,如图4a所示,当前平均时延、当前成功率、***成功率、当前交易量归属于场景,不同的场景又归属于同一个子产品。Here, the product information, scene information, and abnormal item information in the preset dimensions contained in the current abnormal event are respectively stored in the neo4j node, and the node is connected through belongto to indicate attribution information; the neo4j graph can be displayed on the computer As shown in Figure 4a, the current average delay, current success rate, system success rate, and current transaction volume belong to scenarios, and different scenarios belong to the same sub-product.
步骤504、匹配相似历史异常事件;Step 504: Match similar historical abnormal events;
此处,通过下述相似度计算公式得到当前异常事件与历史异常数据库中各历史异常事件的相似度,大于设定阈值的相似度所对应的历史异常事件为相似历史异常事件。Here, the similarity between the current abnormal event and each historical abnormal event in the historical abnormal database is obtained by the following similarity calculation formula, and the historical abnormal event corresponding to the similarity greater than the set threshold is the similar historical abnormal event.
当前指纹信息与各历史指纹信息的相似度计算可以包括,根据所述当前指纹信息中每个指纹的当前值和每个指纹的权重,确定所述当前指纹信息的当前向量;根据所述当前向量与所述各历史指纹信息对应的各历史向量,通过公式:
Figure PCTCN2020127110-appb-000003
计算所述当前指纹信息与各历史指纹信息之间的相似度,其中,A为当前向量,B为历史向量。
The calculation of the similarity between the current fingerprint information and each historical fingerprint information may include determining the current vector of the current fingerprint information according to the current value of each fingerprint in the current fingerprint information and the weight of each fingerprint; Each historical vector corresponding to each historical fingerprint information, through the formula:
Figure PCTCN2020127110-appb-000003
Calculate the similarity between the current fingerprint information and each historical fingerprint information, where A is the current vector and B is the historical vector.
举个例子,设(每个指纹的权重值=)features_weight={For example, let (weight value of each fingerprint=)features_weight={
(告警权重值为3)'root_imsAlert':3,(Alarm weight value is 3)'root_imsAlert': 3,
(接口权重值为3)'root_imsInterface':2,(The interface weight value is 3)'root_imsInterface': 2,
(日志权重值为3)'root_imsrcaLog':2,(The log weight value is 3)'root_imsrcaLog': 2,
(应用版本SQL操作权重值为3)'root_sr':4,(The SQL operation weight value of the application version is 3)'root_sr': 4,
(应用版本发布操作权重值为3)'root_pr':3,(The application version release operation weight value is 3)'root_pr': 3,
(推广权重值为3)'root_promotion':7,(The promotion weight value is 3)'root_promotion': 7,
(非应用版本变更权重值为3)'root_itsm':3,(The weight value of non-application version changes is 3)'root_itsm': 3,
(KPI异常项目数值曲线权重值为3)'metric_exception':5,(The weight value of the numerical curve of the KPI abnormal item is 3)'metric_exception': 5,
(异常项目数值归属的子产品权重值为3)'sub_production_id':1,(The sub-product weight value of the abnormal item value is 3)'sub_production_id':1,
(异常项目数值归属的场景权重值为3)'subScenarioId':2,(The value of the scenario weight to which the value of the abnormal item belongs is 3)'subScenarioId': 2,
(时间段权重值为3)'time_period':5}(The time period weight value is 3)'time_period':5)
设当前异常事件的当前指纹信息包括:时间段:白天、KPI异常项目数值曲线ID:69766:-1、KPI异常项目数值曲线ID:17319:-1、接口ID:CPUPCA_47758、异常项目数值归属的子产品ID:401,因此,A的(指纹)fps=["subScenarioId->4010101",Suppose that the current fingerprint information of the current abnormal event includes: time period: daytime, KPI abnormal item value curve ID: 69766:-1, KPI abnormal item value curve ID: 17319:-1, interface ID: CPUPCA_47758, the subordinate to which the value of the abnormal item belongs Product ID: 401, therefore, A's (fingerprint) fps=["subScenarioId->4010101",
"time_period->day","time_period->day",
"metric_exception->69766:-1","metric_exception->69766:-1",
"metric_exception->17319:-1","metric_exception->17319:-1",
"root_imsInterface->CPUPCA_47758","root_imsInterface->CPUPCA_47758",
"sub_production_id->401"]"sub_production_id->401"]
设历史异常事件的历史指纹信息包括:时间段:白天、接口ID: CPUPCA_47758、异常项目数值归属的子产品ID:401、异常项目数值归属的场景ID:4010101、日志ID:UPP_11077、KPI异常项目数值曲线ID:17319:-1,因此,B的fps=["time_period->day",Suppose the historical fingerprint information of historical abnormal events includes: time period: daytime, interface ID: CPUPCA_47758, sub-product ID to which abnormal item value belongs: 401, scene ID to which abnormal item value belongs: 4010101, log ID: UPP_11077, KPI abnormal item value Curve ID: 17319:-1, therefore, the fps of B=["time_period->day",
"root_imsInterface->CPUPCA_47758","root_imsInterface->CPUPCA_47758",
"sub_production_id->401","sub_production_id->401",
"subScenarioId->4010101","subScenarioId->4010101",
"root_imsrcaLog->UPP_11077","root_imsrcaLog->UPP_11077",
"metric_exception->17319:-1",]"metric_exception->17319:-1",]
可以通过对当前异常事件和历史异常事件中的上述指纹(维度变量)分别进行向量化,采用one-hot编码,乘以权重,分别得到:当前向量A、历史向量B;The above-mentioned fingerprints (dimension variables) in the current abnormal event and the historical abnormal event can be vectorized separately, one-hot encoding is used, and the weight is multiplied to obtain: current vector A and historical vector B;
Figure PCTCN2020127110-appb-000004
Figure PCTCN2020127110-appb-000004
将A和B代入公式:
Figure PCTCN2020127110-appb-000005
A·B=59,A·A=84,B·B=63,E j(A,B)=67%,那么该当前异常事件与该历史异常事件的相似度为67%。
Substitute A and B into the formula:
Figure PCTCN2020127110-appb-000005
A·B=59, A·A=84, B·B=63, E j (A, B)=67%, then the similarity between the current abnormal event and the historical abnormal event is 67%.
当前指纹信息与各历史指纹信息的相似度计算还可以通过文本匹配方式,如,当前异常事件A的当前指纹信息中有6项指纹,历史异常事件B的历史指纹
Figure PCTCN2020127110-appb-000006
为相似度,则相似度为83.33%。
The similarity between the current fingerprint information and each historical fingerprint information can also be calculated by text matching. For example, there are 6 fingerprints in the current fingerprint information of the current abnormal event A, and the historical fingerprint of the historical abnormal event B
Figure PCTCN2020127110-appb-000006
Is the similarity, the similarity is 83.33%.
步骤505、得到相似历史异常事件的异常根因;Step 505: Obtain abnormal root causes of similar historical abnormal events;
此处,得到相似历史异常事件后,根据相似历史异常事件得到该相似历史异常事件的异常根因,在计算机中可展示为,在图4a的指纹节点(当前平均时间、当前成功率)上,延伸出如图4b所示的历史异常事件,从而得到延伸出的历史异常事件的异常根因。将得到的异常根因按照相似度从高到底和出现次数从高到底依次推荐。Here, after a similar historical abnormal event is obtained, the abnormal root cause of the similar historical abnormal event is obtained according to the similar historical abnormal event, which can be displayed in the computer as the fingerprint node (current average time, current success rate) in Figure 4a, Extend the historical abnormal events as shown in Figure 4b to obtain the abnormal root causes of the extended historical abnormal events. The abnormal root causes obtained are recommended in order of similarity from the highest to the bottom and the number of occurrences from the highest to the bottom.
步骤506、调查当前异常事件的异常根因;Step 506: Investigate the abnormal root cause of the current abnormal event;
此处,根据推荐的相似历史异常事件的异常根因,可以按照推荐顺序依次排查异常根因直到找到当前异常事件的异常根因。Here, according to the recommended abnormal root causes of similar historical abnormal events, the abnormal root causes can be investigated in the recommended order until the abnormal root cause of the current abnormal event is found.
步骤507、标记当前异常事件的异常根因;Step 507: Mark the abnormal root cause of the current abnormal event;
此处,工程技术人员可以对当前查找到的当前异常事件的异常根因分析并将造成该异常现象的重要根因标记,记录该重要根因的属性、分析结果、改良结果等等。Here, engineers and technicians can analyze the abnormal root cause of the current abnormal event that is currently found, mark the important root cause of the abnormal phenomenon, and record the attribute, analysis result, improvement result, etc. of the important root cause.
步骤508、更新并存储异常事件及该异常事件的异常根因;Step 508: Update and store the abnormal event and the abnormal root cause of the abnormal event;
将当前异常事件描述信息和工程技术人员确定的异常根因和标记异常根因等信息存储在历史异常事件数据库中,便于后续相同或相似异常事件的根因定位。The current abnormal event description information and the abnormal root cause determined by the engineering and technical personnel and the marked abnormal root cause are stored in the historical abnormal event database to facilitate subsequent root cause location of the same or similar abnormal events.
基于同样的构思,本发明实施例提供一种异常事件根因定位装置,图6为本申请实施例提供的一种异常事件根因定位装置示意图,如图6示,包括:Based on the same concept, the embodiment of the present invention provides a device for locating the root cause of an abnormal event. FIG. 6 is a schematic diagram of the device for locating the root cause of an abnormal event provided by an embodiment of the application, as shown in FIG. 6, including:
确定单元601,用于确定当前异常事件对应的各预设维度的当前值;根据所述各预设维度的当前值确定所述当前异常事件的当前指纹信息;其中,每个预设维度对应一个指纹;The determining unit 601 is configured to determine the current value of each preset dimension corresponding to the current abnormal event; determine the current fingerprint information of the current abnormal event according to the current value of each preset dimension; wherein, each preset dimension corresponds to one fingerprint;
计算单元602,用于将所述当前指纹信息与各历史指纹信息进行相似度计算;所述各历史指纹信息是根据对应的历史异常事件得到,所述历史异常事件对应有异常根因;The calculation unit 602 is configured to calculate the similarity between the current fingerprint information and each historical fingerprint information; each historical fingerprint information is obtained according to a corresponding historical abnormal event, and the historical abnormal event corresponds to an abnormal root cause;
所述确定单元601还用于,将相似度满足设定阈值的历史指纹信息对应的异常根因,确定为所述当前异常事件的异常根因。The determining unit 601 is further configured to determine the abnormal root cause corresponding to the historical fingerprint information whose similarity meets a set threshold value as the abnormal root cause of the current abnormal event.
在一种可能的设计中,所述计算单元602具体用于:根据所述当前指纹信息中每个指纹的当前值和每个指纹的权重,确定所述当前指纹信息的当前向量;根据所述当前向量与所述各历史指纹信息对应的各历史向量,计算所述当前指纹信息与各历史指纹信息之间的相似度。In a possible design, the calculation unit 602 is specifically configured to: determine the current vector of the current fingerprint information according to the current value of each fingerprint and the weight of each fingerprint in the current fingerprint information; The current vector and each historical vector corresponding to each historical fingerprint information are calculated to calculate the similarity between the current fingerprint information and each historical fingerprint information.
在一种可能的设计中,根据所述当前向量与所述各历史指纹信息对应的各历史向量,计算所述当前指纹信息与各历史指纹信息之间的相似度,包括:In a possible design, calculating the similarity between the current fingerprint information and each historical fingerprint information according to each historical vector corresponding to the current vector and each historical fingerprint information includes:
Figure PCTCN2020127110-appb-000007
Figure PCTCN2020127110-appb-000007
其中,A为所述当前向量,B为所述历史向量。Wherein, A is the current vector, and B is the historical vector.
在一种可能的设计中,所述确定单元601具体用于,筛选所述异常根因中的标记异常根因;以标记异常根因对应的历史指纹信息的相似度为第一基准,以标记异常根因出现次数为第二基准,从各标记异常根因中确定推荐的标记异常根因;根据所述推荐的标记异常根因确定所述当前异常事件的异常根因。In a possible design, the determining unit 601 is specifically configured to screen the marked abnormal root causes among the abnormal root causes; take the similarity of the historical fingerprint information corresponding to the marked abnormal root cause as the first reference, and use the mark The number of occurrences of the abnormal root cause is the second criterion, and the recommended root cause of the marked abnormality is determined from each marked abnormal root cause; the abnormal root cause of the current abnormal event is determined according to the recommended marked abnormal root cause.
基于同样的构思,本发明实施例提供又一种异常事件根因定位装置,图7为本申请实施例提供的又一种异常事件根因定位装置示意图,如图7示,包括:Based on the same concept, the embodiment of the present invention provides yet another abnormal event root cause locating device. FIG. 7 is a schematic diagram of another abnormal event root cause locating device provided by an embodiment of the application, as shown in FIG. 7, including:
确定单元701,用于确定当前异常事件对应的各预设维度的当前值;根据所述各预设维度的当前值确定所述当前异常事件的当前指纹信息;其中,每个预设维度对应一个指纹;The determining unit 701 is configured to determine the current value of each preset dimension corresponding to the current abnormal event; determine the current fingerprint information of the current abnormal event according to the current value of each preset dimension; wherein, each preset dimension corresponds to one fingerprint;
计算单元702,用于将所述当前指纹信息与各历史指纹信息进行相似度计算;所述各历史指纹信息是根据对应的历史异常事件得到,所述历史异常事件对应有异常根因;The calculation unit 702 is configured to calculate the similarity between the current fingerprint information and each historical fingerprint information; each historical fingerprint information is obtained according to a corresponding historical abnormal event, and the historical abnormal event corresponds to an abnormal root cause;
所述确定单元701还用于,将相似度满足设定阈值的历史指纹信息对应的异常根因,确定为所述当前异常事件的异常根因。The determining unit 701 is further configured to determine the abnormal root cause corresponding to the historical fingerprint information whose similarity meets a set threshold as the abnormal root cause of the current abnormal event.
在一种可能的设计中,所述计算单元702具体用于:根据所述当前指纹信息中每个指纹的当前值和每个指纹的权重,确定所述当前指纹信息的当前向量;根据所述当前向量与所述各历史指纹信息对应的各历史向量,计算所述当前指纹信息与各历史指纹信息之间的相似度。In a possible design, the calculation unit 702 is specifically configured to: determine the current vector of the current fingerprint information according to the current value of each fingerprint and the weight of each fingerprint in the current fingerprint information; The current vector and each historical vector corresponding to each historical fingerprint information are calculated to calculate the similarity between the current fingerprint information and each historical fingerprint information.
在一种可能的设计中,根据所述当前向量与所述各历史指纹信息对应的各历史向量,计算所述当前指纹信息与各历史指纹信息之间的相似度,包括:In a possible design, calculating the similarity between the current fingerprint information and each historical fingerprint information according to each historical vector corresponding to the current vector and each historical fingerprint information includes:
Figure PCTCN2020127110-appb-000008
Figure PCTCN2020127110-appb-000008
其中,A为所述当前向量,B为所述历史向量。Wherein, A is the current vector, and B is the historical vector.
在一种可能的设计中,所述确定单元701具体用于,筛选所述异常根因中的标记异常根因;以标记异常根因对应的历史指纹信息的相似度为第一基准,以标记异常根因出现次数为第二基准,从各标记异常根因中确定推荐的标记异常根因;根据所述推荐的标记异常根因确定所述当前异常事件的异常根因。In a possible design, the determining unit 701 is specifically configured to screen the marked abnormal root causes among the abnormal root causes; use the similarity of the historical fingerprint information corresponding to the marked abnormal root cause as the first reference, and use the mark The number of occurrences of the abnormal root cause is the second criterion, and the recommended root cause of the marked abnormality is determined from each marked abnormal root cause; the abnormal root cause of the current abnormal event is determined according to the recommended marked abnormal root cause.
在一种可能的设计中,更新单元703:用于根据所述当前异常事件对应的各预设维度的当前值,更新所述当前指纹信息并将所述当前指纹信息存储为历史指纹信息。In a possible design, the updating unit 703 is configured to update the current fingerprint information according to the current value of each preset dimension corresponding to the current abnormal event and store the current fingerprint information as historical fingerprint information.
在一种可能的设计中,存储单元704:按照如下方式存储历史指纹信息和历史指纹信息对应的异常根因:将历史异常事件作为事件节点,所述事件节点中记录有事件标识;将所述历史指纹信息中的每个指纹作为指纹节点,所述指纹节点中记录有所述历史异常事件在所述指纹对应的预设维度的历史值和所述事件标识;将所述历史指纹信息对应的异常根因作为根因节点,所述根因节点中记录有所述历史异常事件对应的异常根因和所述事件标识;将所述事件节点与所述指纹节点中的现象类指纹节点通过第一边关联存储;所述第一边用于指示指纹节点与事件节点之间存在预设维度中的现象关系;将所述事件节点与所述指纹节点中的根因类指纹节点通过第二边关联存储;所述第二边用于指示从指纹节点与事件节点之间存在根因关系;将所述事件节点与所述根因节点通过所述第二边关联存储;其中,所述第一边用于指示指纹节点为事件节点的预设维度;所述第二边用于指示根因节点为事件节点与的根因。In a possible design, the storage unit 704: stores the historical fingerprint information and the abnormal root cause corresponding to the historical fingerprint information in the following manner: the historical abnormal event is taken as the event node, and the event identifier is recorded in the event node; Each fingerprint in the historical fingerprint information serves as a fingerprint node, and the fingerprint node records the historical value of the historical abnormal event in the preset dimension corresponding to the fingerprint and the event identifier; the historical fingerprint information corresponds to The abnormal root cause serves as the root cause node, and the abnormal root cause corresponding to the historical abnormal event and the event identifier are recorded in the root cause node; the event node and the phenomenon fingerprint node in the fingerprint node are passed through the first One side is associated and stored; the first side is used to indicate that there is a phenomenon relationship in a preset dimension between the fingerprint node and the event node; the root factor fingerprint node in the event node and the fingerprint node is passed through the second side Associated storage; the second edge is used to indicate that there is a root cause relationship between the secondary fingerprint node and the event node; the event node and the root cause node are associated and stored through the second edge; wherein, the first The edge is used to indicate that the fingerprint node is the preset dimension of the event node; the second edge is used to indicate that the root cause node is the root cause of the event node and.
本领域内的技术人员应明白,本申请的实施例可提供为方法、***、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present application can be provided as methods, systems, or computer program products. Therefore, this application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, this application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
本申请是参照根据本申请的方法、设备(***)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程 图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。This application is described with reference to flowcharts and/or block diagrams of methods, equipment (systems), and computer program products according to this application. It should be understood that each process and/or block in the flowchart and/or block diagram, and the combination of processes and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, an embedded processor, or other programmable data processing equipment to generate a machine, so that the instructions executed by the processor of the computer or other programmable data processing equipment are used to generate It is a device that realizes the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device. The device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment. The instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the application without departing from the spirit and scope of the application. In this way, if these modifications and variations of this application fall within the scope of the claims of this application and their equivalent technologies, then this application is also intended to include these modifications and variations.

Claims (10)

  1. 一种异常事件根因定位方法,其特征在于,所述方法包括:A method for locating the root cause of an abnormal event, characterized in that the method includes:
    确定当前异常事件对应的各预设维度的当前值;Determine the current value of each preset dimension corresponding to the current abnormal event;
    根据所述各预设维度的当前值确定所述当前异常事件的当前指纹信息;其中,每个预设维度对应一个指纹;Determine the current fingerprint information of the current abnormal event according to the current value of each preset dimension; wherein, each preset dimension corresponds to a fingerprint;
    将所述当前指纹信息与各历史指纹信息进行相似度计算;所述各历史指纹信息是根据对应的历史异常事件得到,所述历史异常事件对应有异常根因;Calculating the similarity between the current fingerprint information and each historical fingerprint information; each historical fingerprint information is obtained according to a corresponding historical abnormal event, and the historical abnormal event corresponds to an abnormal root cause;
    将相似度满足设定阈值的历史指纹信息对应的异常根因,确定为所述当前异常事件的异常根因。The abnormal root cause corresponding to the historical fingerprint information whose similarity meets the set threshold is determined as the abnormal root cause of the current abnormal event.
  2. 根据权利要求1所述的方法,其特征在于,将所述当前指纹信息与各历史指纹信息进行相似度计算,包括:The method according to claim 1, wherein calculating the similarity between the current fingerprint information and each historical fingerprint information comprises:
    根据所述当前指纹信息中每个指纹的当前值和每个指纹的权重,确定所述当前指纹信息的当前向量;Determine the current vector of the current fingerprint information according to the current value of each fingerprint and the weight of each fingerprint in the current fingerprint information;
    根据所述当前向量与所述各历史指纹信息对应的各历史向量,计算所述当前指纹信息与各历史指纹信息之间的相似度。Calculate the similarity between the current fingerprint information and each historical fingerprint information according to each historical vector corresponding to the current vector and each historical fingerprint information.
  3. 根据权利要求2所述的方法,其特征在于,根据所述当前向量与所述各历史指纹信息对应的各历史向量,计算所述当前指纹信息与各历史指纹信息之间的相似度,包括:The method according to claim 2, wherein calculating the similarity between the current fingerprint information and each historical fingerprint information according to each historical vector corresponding to the current vector and each historical fingerprint information comprises:
    Figure PCTCN2020127110-appb-100001
    Figure PCTCN2020127110-appb-100001
    其中,A为所述当前向量,B为所述历史向量。Wherein, A is the current vector, and B is the historical vector.
  4. 根据权利要求1所述的方法,其特征在于,将相似度满足设定阈值的历史指纹信息对应的异常根因,确定为所述当前异常事件的异常根因,包括:The method according to claim 1, wherein determining the abnormal root cause corresponding to the historical fingerprint information whose similarity meets a set threshold as the abnormal root cause of the current abnormal event comprises:
    筛选所述异常根因中的标记异常根因;Screening the marked abnormal root causes among the abnormal root causes;
    以标记异常根因对应的历史指纹信息的相似度为第一基准,以标记异常根因出现次数为第二基准,从各标记异常根因中确定推荐的标记异常根因;Taking the similarity of the historical fingerprint information corresponding to the marked abnormal root cause as the first reference, and the number of occurrences of the marked abnormal root cause as the second reference, determine the recommended marked abnormal root cause from each marked abnormal root cause;
    根据所述推荐的标记异常根因确定所述当前异常事件的异常根因。Determine the abnormal root cause of the current abnormal event according to the recommended marking abnormal root cause.
  5. 根据权利要求1-4中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1-4, wherein the method further comprises:
    根据所述当前异常事件对应的各预设维度的当前值,更新所述当前指纹信息并将所述当前指纹信息存储为历史指纹信息。According to the current value of each preset dimension corresponding to the current abnormal event, update the current fingerprint information and store the current fingerprint information as historical fingerprint information.
  6. 根据权利要求1-4中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1-4, wherein the method further comprises:
    按照如下方式存储历史指纹信息和历史指纹信息对应的异常根因:Store the historical fingerprint information and the abnormal root cause corresponding to the historical fingerprint information as follows:
    将历史异常事件作为事件节点,所述事件节点中记录有事件标识;Taking a historical abnormal event as an event node, and an event identifier is recorded in the event node;
    将所述历史指纹信息中的每个指纹作为指纹节点,所述指纹节点中记录有所述历史异常事件在所述指纹对应的预设维度的历史值和所述事件标识;Taking each fingerprint in the historical fingerprint information as a fingerprint node, and the fingerprint node records the historical value of the historical abnormal event in the preset dimension corresponding to the fingerprint and the event identifier;
    将所述历史指纹信息对应的异常根因作为根因节点,所述根因节点中记录有所述历史异常事件对应的异常根因和所述事件标识;Taking the abnormal root cause corresponding to the historical fingerprint information as a root cause node, and the abnormal root cause corresponding to the historical abnormal event and the event identifier are recorded in the root cause node;
    将所述事件节点与所述指纹节点中的现象类指纹节点通过第一边关联存储;所述第一边用于指示指纹节点与事件节点之间存在预设维度中的现象关 系;The event node and the phenomenon fingerprint node in the fingerprint node are associated and stored through a first edge; the first edge is used to indicate that there is a phenomenon relationship in a preset dimension between the fingerprint node and the event node;
    将所述事件节点与所述指纹节点中的根因类指纹节点通过第二边关联存储;所述第二边用于指示从指纹节点与事件节点之间存在根因关系;The event node and the root cause fingerprint node in the fingerprint node are associated and stored through a second edge; the second edge is used to indicate that there is a root cause relationship between the secondary fingerprint node and the event node;
    将所述事件节点与所述根因节点通过所述第二边关联存储。The event node and the root cause node are associated and stored through the second edge.
  7. 一种异常事件根因定位装置,其特征在于,所述装置包括:A device for locating the root cause of an abnormal event, characterized in that the device comprises:
    确定单元,用于确定当前异常事件对应的各预设维度的当前值;根据所述各预设维度的当前值确定所述当前异常事件的当前指纹信息;其中,每个预设维度对应一个指纹;The determining unit is used to determine the current value of each preset dimension corresponding to the current abnormal event; determine the current fingerprint information of the current abnormal event according to the current value of each preset dimension; wherein, each preset dimension corresponds to a fingerprint ;
    计算单元,用于将所述当前指纹信息与各历史指纹信息进行相似度计算;所述各历史指纹信息是根据对应的历史异常事件得到,所述历史异常事件对应有异常根因;The calculation unit is configured to calculate the similarity between the current fingerprint information and each historical fingerprint information; each historical fingerprint information is obtained according to a corresponding historical abnormal event, and the historical abnormal event corresponds to an abnormal root cause;
    所述确定单元还用于,将相似度满足设定阈值的历史指纹信息对应的异常根因,确定为所述当前异常事件的异常根因。The determining unit is further configured to determine the abnormal root cause corresponding to the historical fingerprint information whose similarity meets a set threshold as the abnormal root cause of the current abnormal event.
  8. 根据权利要求7所述的装置,其特征在于,所述计算单元具体用于:The device according to claim 7, wherein the calculation unit is specifically configured to:
    根据所述当前指纹信息中每个指纹的当前值和每个指纹的权重,确定所述当前指纹信息的当前向量;Determine the current vector of the current fingerprint information according to the current value of each fingerprint and the weight of each fingerprint in the current fingerprint information;
    根据所述当前向量与所述各历史指纹信息对应的各历史向量,计算所述当前指纹信息与各历史指纹信息之间的相似度。Calculate the similarity between the current fingerprint information and each historical fingerprint information according to each historical vector corresponding to the current vector and each historical fingerprint information.
  9. 一种计算设备,其特征在于,包括:A computing device, characterized in that it comprises:
    存储器,用于存储程序指令;Memory, used to store program instructions;
    处理器,用于调用所述存储器中存储的程序指令,按照获得的程序执行权利要求1至6任一项所述的方法。The processor is configured to call the program instructions stored in the memory, and execute the method according to any one of claims 1 to 6 according to the obtained program.
  10. 一种计算机可读非易失性存储介质,其特征在于,包括计算机可读指令,当计算机读取并执行所述计算机可读指令时,使得计算机执行如权利要求1至6任一项所述的方法。A computer-readable non-volatile storage medium, characterized by comprising computer-readable instructions, when the computer reads and executes the computer-readable instructions, the computer is caused to execute any one of claims 1 to 6 Methods.
PCT/CN2020/127110 2019-12-12 2020-11-06 Method and device for positioning fundamental cause of abnormal event WO2021114977A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911276509.8 2019-12-12
CN201911276509.8A CN111158977B (en) 2019-12-12 2019-12-12 Abnormal event root cause positioning method and device

Publications (1)

Publication Number Publication Date
WO2021114977A1 true WO2021114977A1 (en) 2021-06-17

Family

ID=70556829

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/127110 WO2021114977A1 (en) 2019-12-12 2020-11-06 Method and device for positioning fundamental cause of abnormal event

Country Status (2)

Country Link
CN (1) CN111158977B (en)
WO (1) WO2021114977A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113689042A (en) * 2021-08-25 2021-11-23 华自科技股份有限公司 Fault source prediction method for monitoring node
CN114037100A (en) * 2021-11-15 2022-02-11 国网山东省电力公司信息通信公司 AI technology-based power equipment operation and maintenance method and system
CN114422324A (en) * 2021-12-29 2022-04-29 中国电信股份有限公司 Alarm information processing method and device, electronic equipment and storage medium
CN114710392A (en) * 2022-03-23 2022-07-05 阿里云计算有限公司 Event information acquisition method and device
CN115277453A (en) * 2022-06-13 2022-11-01 北京宝兰德软件股份有限公司 Method for generating abnormal knowledge graph in operation and maintenance field, application method and device
CN116054416A (en) * 2023-03-15 2023-05-02 扬州康德电气有限公司 Intelligent monitoring operation and maintenance management system based on Internet of things
CN116599822A (en) * 2023-07-18 2023-08-15 云筑信息科技(成都)有限公司 Fault alarm treatment method based on log acquisition event

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111158977B (en) * 2019-12-12 2023-07-11 深圳前海微众银行股份有限公司 Abnormal event root cause positioning method and device
CN111597070B (en) * 2020-07-27 2020-11-27 北京必示科技有限公司 Fault positioning method and device, electronic equipment and storage medium
CN114285730A (en) * 2020-09-18 2022-04-05 华为技术有限公司 Method and device for determining fault root cause and related equipment
CN112308455B (en) * 2020-11-20 2024-04-09 深圳前海微众银行股份有限公司 Root cause positioning method, root cause positioning device, root cause positioning equipment and computer storage medium
CN112702198B (en) * 2020-12-18 2023-03-14 北京达佳互联信息技术有限公司 Abnormal root cause positioning method and device, electronic equipment and storage medium
CN112769615B (en) * 2021-01-05 2023-04-18 ***股份有限公司 Anomaly analysis method and device
CN112882911B (en) * 2021-02-01 2023-12-29 中电科网络空间安全研究院有限公司 Abnormal performance behavior detection method, system, device and storage medium
CN113298638B (en) * 2021-05-12 2023-07-14 深圳前海微众银行股份有限公司 Root cause positioning method, electronic equipment and storage medium
CN113032238B (en) * 2021-05-25 2021-08-17 南昌惠联网络技术有限公司 Real-time root cause analysis method based on application knowledge graph
CN113868008A (en) * 2021-10-14 2021-12-31 中国建设银行股份有限公司 Exception handling method and device
CN114157553B (en) * 2021-12-08 2024-06-18 深圳前海微众银行股份有限公司 Data processing method, device, equipment and storage medium
CN114338351B (en) * 2021-12-31 2024-01-12 天翼物联科技有限公司 Network anomaly root cause determination method and device, computer equipment and storage medium
CN114354854B (en) * 2022-01-06 2024-02-13 武汉祁联生态科技有限公司 Abnormality detection method for smoke monitoring data
CN115576732B (en) * 2022-11-15 2023-03-10 阿里云计算有限公司 Root cause positioning method and system
CN115729796B (en) * 2022-12-23 2023-10-10 中软国际科技服务有限公司 Abnormal operation analysis method based on artificial intelligence and big data application system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015130052A (en) * 2014-01-07 2015-07-16 株式会社日立システムズ Rack, apparatus automatic monitoring system, automatic failure notification method, and automatic failure notification program
CN107688658A (en) * 2017-09-05 2018-02-13 北京奇艺世纪科技有限公司 The localization method and device of a kind of abnormal data
CN110309009A (en) * 2019-05-21 2019-10-08 北京云集智造科技有限公司 Situation-based operation and maintenance fault root cause positioning method, device, equipment and medium
CN111158977A (en) * 2019-12-12 2020-05-15 深圳前海微众银行股份有限公司 Abnormal event root cause positioning method and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10698926B2 (en) * 2017-04-20 2020-06-30 Microsoft Technology Licensing, Llc Clustering and labeling streamed data
CN109583161B (en) * 2018-11-27 2021-08-06 咪咕文化科技有限公司 Information processing method and device and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015130052A (en) * 2014-01-07 2015-07-16 株式会社日立システムズ Rack, apparatus automatic monitoring system, automatic failure notification method, and automatic failure notification program
CN107688658A (en) * 2017-09-05 2018-02-13 北京奇艺世纪科技有限公司 The localization method and device of a kind of abnormal data
CN110309009A (en) * 2019-05-21 2019-10-08 北京云集智造科技有限公司 Situation-based operation and maintenance fault root cause positioning method, device, equipment and medium
CN111158977A (en) * 2019-12-12 2020-05-15 深圳前海微众银行股份有限公司 Abnormal event root cause positioning method and device

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113689042A (en) * 2021-08-25 2021-11-23 华自科技股份有限公司 Fault source prediction method for monitoring node
CN114037100A (en) * 2021-11-15 2022-02-11 国网山东省电力公司信息通信公司 AI technology-based power equipment operation and maintenance method and system
CN114037100B (en) * 2021-11-15 2024-01-16 国网山东省电力公司信息通信公司 AI technology-based power equipment operation and maintenance method and system
CN114422324A (en) * 2021-12-29 2022-04-29 中国电信股份有限公司 Alarm information processing method and device, electronic equipment and storage medium
CN114422324B (en) * 2021-12-29 2024-02-23 中国电信股份有限公司 Alarm information processing method and device, electronic equipment and storage medium
CN114710392A (en) * 2022-03-23 2022-07-05 阿里云计算有限公司 Event information acquisition method and device
CN114710392B (en) * 2022-03-23 2024-03-12 阿里云计算有限公司 Event information acquisition method and device
CN115277453A (en) * 2022-06-13 2022-11-01 北京宝兰德软件股份有限公司 Method for generating abnormal knowledge graph in operation and maintenance field, application method and device
CN116054416A (en) * 2023-03-15 2023-05-02 扬州康德电气有限公司 Intelligent monitoring operation and maintenance management system based on Internet of things
CN116054416B (en) * 2023-03-15 2023-09-22 扬州康德电气有限公司 Intelligent monitoring operation and maintenance management system based on Internet of things
CN116599822A (en) * 2023-07-18 2023-08-15 云筑信息科技(成都)有限公司 Fault alarm treatment method based on log acquisition event
CN116599822B (en) * 2023-07-18 2023-10-20 云筑信息科技(成都)有限公司 Fault alarm treatment method based on log acquisition event

Also Published As

Publication number Publication date
CN111158977A (en) 2020-05-15
CN111158977B (en) 2023-07-11

Similar Documents

Publication Publication Date Title
WO2021114977A1 (en) Method and device for positioning fundamental cause of abnormal event
CN110661659B (en) Alarm method, device and system and electronic equipment
US10698757B2 (en) Tuning context-aware rule engine for anomaly detection
CN110708204B (en) Abnormity processing method, system, terminal and medium based on operation and maintenance knowledge base
CN111984499B (en) Fault detection method and device for big data cluster
US11755938B2 (en) Graphical user interface indicating anomalous events
WO2023071761A1 (en) Anomaly positioning method and device
CN109670091B (en) Metadata intelligent maintenance method and device based on data standard
CN114881167B (en) Abnormality detection method, abnormality detection device, electronic device, and medium
CN116010220A (en) Alarm diagnosis method, device, equipment and storage medium
CN115687432A (en) Method, apparatus, and medium for monitoring anomalous transaction data
CN115544519A (en) Method for carrying out security association analysis on threat information of metering automation system
CN107548087A (en) A kind of method and device of warning association analysis
CN114443437A (en) Alarm root cause output method, apparatus, device, medium, and program product
CN114153646A (en) Operation and maintenance fault handling method and device, storage medium and processor
CN112836124A (en) Image data acquisition method and device, electronic equipment and storage medium
CN116955856A (en) Information display method, device, electronic equipment and storage medium
WO2023039973A1 (en) Abnormal false alarm processing method and apparatus, and storage medium and terminal
CN115687406A (en) Sampling method, device and equipment of call chain data and storage medium
CN113052700B (en) Method and device for determining micro-service call chain
CN114706893A (en) Fault detection method, device, equipment and storage medium
CN110781309A (en) Entity parallel relation similarity calculation method based on pattern matching
CN117033148A (en) Alarm method, device, electronic equipment and medium of risk service interface
CN111865689B (en) Alarm voltage drop method based on index set tree
CN110175098B (en) Information processing method and information processing system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20900133

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 13/10/2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20900133

Country of ref document: EP

Kind code of ref document: A1