CN117240700A - Network fault diagnosis method and device based on Bayesian classifier - Google Patents

Network fault diagnosis method and device based on Bayesian classifier Download PDF

Info

Publication number
CN117240700A
CN117240700A CN202311494191.7A CN202311494191A CN117240700A CN 117240700 A CN117240700 A CN 117240700A CN 202311494191 A CN202311494191 A CN 202311494191A CN 117240700 A CN117240700 A CN 117240700A
Authority
CN
China
Prior art keywords
log
fault
fault type
attribute
bayesian classifier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311494191.7A
Other languages
Chinese (zh)
Other versions
CN117240700B (en
Inventor
杨泓雨
龚永生
张雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Jiuzhou Future Information Technology Co ltd
Original Assignee
Zhejiang Jiuzhou Future Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Jiuzhou Future Information Technology Co ltd filed Critical Zhejiang Jiuzhou Future Information Technology Co ltd
Priority to CN202311494191.7A priority Critical patent/CN117240700B/en
Publication of CN117240700A publication Critical patent/CN117240700A/en
Application granted granted Critical
Publication of CN117240700B publication Critical patent/CN117240700B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the specification discloses a network fault diagnosis method and device based on a Bayesian classifier. Dividing each log text vector into a log set; calculating cosine similarity of the two log vector entries, merging the two log vector entries with the cosine similarity higher than a first preset value, and endowing the two log vector entries with the same log attribute; constructing each log attribute set corresponding to each log set; calculating a Pelson correlation coefficient of the log attribute pair, and performing redundancy elimination on the target log attribute pair to obtain a target log attribute set; and training a Bayesian classifier, and diagnosing the fault type corresponding to the abnormality log obtained in real time based on the Bayesian classifier. According to the embodiment of the specification, the complex fault diagnosis scene can be diagnosed with high efficiency directly through the log attribute of the abnormal log, the diagnosis result with high reliability can be provided, the false alarm rate and the missing report rate are reduced, and the network fault can be found and solved in time.

Description

Network fault diagnosis method and device based on Bayesian classifier
Technical Field
One or more embodiments of the present disclosure relate to cloud computing network technologies, and in particular, to a network fault diagnosis method and device based on a bayesian classifier.
Background
In cloud computing networks, the environment that carries many complex application services is relatively complex. Cloud computing networks on the one hand typically contain a large number of servers, storage devices, and network devices, which may be distributed throughout different data centers, with different hardware and software configurations, and this scale and variety increases the difficulty of network management and maintenance. On the other hand, the cloud computing network needs to be monitored and managed in real time to ensure performance, availability and safety, so that a manager needs to process a large amount of log and event data to perform fault removal and performance optimization.
If a cloud computing network fails, a number of similar alarms are generated by various devices and software associated with the failure. These alarms may propagate through the network, causing other devices to also generate a number of alarms. The main goal of network management is to ensure that the network is functioning properly, and determining the root cause of a failure among a large number of similar alarms would be a challenging task. Therefore, effective fault prediction diagnosis methods and strategies are important to ensure the stability and performance of the cloud network.
The existing cloud network fault prediction diagnosis method is mainly based on mass log collection of a cloud computing server and intermediate network equipment, and combines a monitoring alarm system to conduct fault research and judgment on data acquisition of key network indexes and hardware performance indexes of related equipment and facilities. The method works effectively under small-scale and simple cloud network faults, but is difficult to adapt to the cascading cloud network faults or the avalanche cloud network faults. Because the monitoring information, alarm information, log information of various devices will grow exponentially when a cascading or avalanche type fault occurs, and the information for fault diagnosis presents overlapping and redundancy at the time-space level, there are often multiple causes of occurrence of network faults as well. In summary, the existing fault diagnosis method is not ideal in timeliness and accuracy of fault diagnosis when facing to cascade type or avalanche type fault scenes.
Disclosure of Invention
To solve the above problems, one or more embodiments of the present disclosure describe a network fault diagnosis method and apparatus based on a bayesian classifier.
According to a first aspect, there is provided a bayesian classifier based network fault diagnosis method, the method comprising:
acquiring log text vectors corresponding to historical network faults, and dividing each log text vector into at least one log set based on fault types;
for any one of the log sets, selecting one log vector entry from any two log text vectors in the log set, calculating cosine similarity of the two log vector entries, and merging the two log vector entries with the cosine similarity higher than a first preset value to endow the same log attribute;
after all cosine similarity is calculated through traversal, constructing each log attribute set corresponding to each log set;
selecting any pair of log attribute pairs from the log attribute set aiming at the log attribute set corresponding to any fault type, calculating the pearson correlation coefficient of the log attribute pair, and performing redundancy elimination on target log attribute pairs of which the pearson correlation coefficient exceeds a first preset range;
After traversing and calculating all the pearson correlation coefficients, obtaining each target log attribute set corresponding to each fault type;
and constructing a training sample set according to each target log attribute set, training a Bayesian classifier based on the training sample set, and diagnosing the fault type corresponding to the abnormal log obtained in real time based on the trained Bayesian classifier.
Preferably, the obtaining the log text vector corresponding to the historical network fault includes:
and acquiring log information corresponding to the historical network faults, and respectively carrying out text vectorization on each log information to obtain each log text vector.
Preferably, the dividing each log text vector into at least one log set based on the fault type includes:
and labeling each log text vector based on the fault type, and dividing each log text vector with the same label into a log set to obtain at least one log set.
Preferably, the performing redundancy elimination on the target log attribute pair with the pearson correlation coefficient exceeding the first preset range includes:
and determining a target log attribute with time sequence lag in a target log attribute pair of which the pearson correlation coefficient exceeds a first preset range, and eliminating the target log attribute.
Preferably, the diagnosing, based on the trained bayesian classifier, the fault type corresponding to the anomaly log obtained in real time includes:
processing an anomaly log obtained in real time based on a trained Bayesian classifier to obtain at least one predicted fault type and a fault probability corresponding to the predicted fault type;
and determining the target prediction fault type with the highest fault probability as the fault type corresponding to the abnormal log.
Preferably, after diagnosing the fault type corresponding to the anomaly log obtained in real time based on the trained bayesian classifier, the method further includes:
and acquiring a fault time prediction matrix corresponding to the fault type, and determining the occurrence time of the predicted fault based on the fault time prediction matrix.
Preferably, the method further comprises:
when the abnormal log has abnormal log attributes which cannot be classified by the Bayesian classifier, comparing an actual fault type corresponding to the abnormal log attributes with a fault type set corresponding to the Bayesian classifier;
when a target fault type matched with the actual fault type exists in the fault type set, adding the abnormal log attribute to the fault set of the target fault type, and retraining the Bayesian classifier;
When the target fault type matched with the actual fault type does not exist in the fault type set, a new fault type to be trained is established, the abnormal log attribute is added to the fault set of the fault type to be trained, and the Bayesian classifier is retrained.
According to a second aspect, there is provided a bayesian classifier based network fault diagnosis apparatus, the apparatus comprising:
the acquisition module is used for acquiring log text vectors corresponding to the historical network faults and dividing each log text vector into at least one log set based on the fault type;
the first calculation module is used for selecting one log vector item from any two log text vectors in the log set respectively, calculating cosine similarity of the two log vector items, and merging the two log vector items with the cosine similarity higher than a first preset value to endow the same log attribute;
the first traversing module is used for constructing each log attribute set corresponding to each log set after traversing and calculating all the cosine similarity;
the second calculation module is used for selecting any pair of log attribute pairs from the log attribute set according to the log attribute set corresponding to any fault type, calculating the pearson correlation coefficient of the log attribute pairs, and performing redundancy elimination on target log attribute pairs of which the pearson correlation coefficient exceeds a first preset range;
The second traversing module is used for traversing and calculating all the pearson correlation coefficients to obtain each target log attribute set corresponding to each fault type;
the diagnosis module is used for constructing a training sample set according to each target log attribute set, training a Bayesian classifier based on the training sample set, and diagnosing fault types corresponding to the abnormal logs acquired in real time based on the trained Bayesian classifier.
According to a third aspect, there is provided an electronic device comprising a processor and a memory;
the processor is connected with the memory;
the memory is used for storing executable program codes;
the processor runs a program corresponding to executable program code stored in the memory by reading the executable program code for performing the steps of the method as provided in the first aspect or any one of the possible implementations of the first aspect.
According to a fourth aspect, there is provided a computer readable storage medium having stored thereon a computer program having instructions stored therein which, when run on a computer or processor, cause the computer or processor to perform a method as provided by any one of the possible implementations of the first aspect or the first aspect.
According to the method and the device provided by the embodiment of the specification, the high-dimensional log text data can be mapped into the low-dimensional space by merging the log attributes through text vectorization of the log information and cosine similarity calculation of the log vector entries, so that the complexity of the log data is reduced. Meanwhile, the relevance among log attributes of different log entries is analyzed through the pearson relevance coefficient, so that the relevance between fault types and abnormal events can be recognized, the redundancy of the log attributes is further reduced, and the condition independence of data of a training sample set used for training the Bayesian classifier is ensured. Furthermore, the trained Bayesian classifier can be used for efficiently diagnosing complex fault diagnosis scenes directly through the log attribute of the abnormal log, and can provide diagnosis results with high reliability, so that the false alarm rate and the false alarm rate are reduced, and the network fault can be found and solved in time.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method of network fault diagnosis based on a Bayesian classifier in one embodiment of the present description.
Fig. 2 is a schematic structural diagram of a network fault diagnosis apparatus based on a bayesian classifier according to an embodiment of the present disclosure.
Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application.
In the following description, the terms "first," "second," and "first," are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. The following description provides various embodiments of the application that may be substituted or combined between different embodiments, and thus the application is also to be considered as embracing all possible combinations of the same and/or different embodiments described. Thus, if one embodiment includes feature A, B, C and another embodiment includes feature B, D, then the present application should also be considered to include embodiments that include one or more of all other possible combinations including A, B, C, D, although such an embodiment may not be explicitly recited in the following.
The following description provides examples and does not limit the scope, applicability, or examples set forth in the claims. Changes may be made in the function and arrangement of elements described without departing from the scope of the application. Various examples may omit, replace, or add various procedures or components as appropriate. For example, the described methods may be performed in a different order than described, and various steps may be added, omitted, or combined. Furthermore, features described with respect to some examples may be combined into other examples.
Referring to fig. 1, fig. 1 is a flowchart of a network fault diagnosis method based on a bayesian classifier according to an embodiment of the present application. In an embodiment of the present application, the method includes:
s101, acquiring log text vectors corresponding to historical network faults, and dividing each log text vector into at least one log set based on fault types.
The execution subject of the application may be a cloud server in the cloud computing network that is mainly responsible for troubleshooting.
In the embodiment of the present disclosure, the cloud server samples log information of each server and network device in the cloud network environment to obtain a log text vector of relevant log information corresponding to the occurrence of a network failure. The network fault has a plurality of different fault types, so the fault types of the network fault corresponding to the obtained different log text vectors can also be different, and each log text vector of the same fault type is further required to be classified according to the fault type, and each log text vector of the same fault type is classified into one log set. Finally, each fault type is correspondingly divided into a log set.
Exemplary, the log information may beFor a time period, device IP, device type, time, keyword vector is recorded as e.g. +.>In the form of a log text vector.
In one implementation manner, the obtaining the log text vector corresponding to the historical network fault includes:
and acquiring log information corresponding to the historical network faults, and respectively carrying out text vectorization on each log information to obtain each log text vector.
In the embodiment of the present specification, log information stored by each server and network device in the cloud environment may exist in a plurality of different formats, for example, a text format, a JSON format, an XML format, a database format, a binary format, and the like. In order to facilitate subsequent processing, the cloud server performs text vectorization processing on the acquired log information, and uniformly converts the log files from a text data form to a data vector form.
In one embodiment, the dividing each of the log text vectors into at least one log set based on the fault type includes:
and labeling each log text vector based on the fault type, and dividing each log text vector with the same label into a log set to obtain at least one log set.
In the embodiment of the present disclosure, the cloud server marks each log text vector according to the fault type, and the marks corresponding to different fault types will be different. After all the log text vectors are marked, the cloud server divides the same marked log text vector into a log set, so that a log set is divided for each fault type.
Wherein, since the log text vector is obtained from the cloud network server or the network device according to the historically occurring fault, the log text vector obtained each time is actually the log text vector in the time period of the occurrence of the fault. Illustratively, for a historical failure T, time stamping is performedExtraction time frame->The journal vectors within may constitute journal set a. For the same type of faults T which have occurred many times, the logs collected during each occurrence are not necessarily identical, so that the logs collected each time can be firstly generated into a subset, and then the subsets are integrated into a log set
S102, for any one of the log sets, selecting one log vector entry from any two log text vectors in the log set, calculating cosine similarity of the two log vector entries, and merging the two log vector entries with the cosine similarity higher than a first preset value to endow the same log attribute.
In the embodiment of the present specification, there will be at least one journal vector entry in each journal text vector, and the journal vector entry may be a keyword vector in the journal text vector. Each log vector entry has a UUID identifier and is assigned a corresponding log attribute according to the specific content of the log vector entry. For a log set of a certain fault type, there may be substantially identical log vector entries between different log text vectors in the log set, so that the redundancy of data in the log set is higher. Therefore, the cloud server can randomly select two log text vectors from the log set, select one log vector entry from the log text vectors respectively, and calculate cosine similarity of the two log text vectors. If the computed cosine similarity is higher than a preset first preset value, the computed cosine similarity is considered to be higher, the cloud server merges the two log vector entries, and endows the two log vector entries with the same UUID and the same log attribute, and the redundancy of the log set. If the calculated cosine similarity does not exceed the first preset value, the calculated cosine similarity is considered to have no higher similarity, no additional processing is performed, and the UUID and the log attribute of each of the two are still reserved.
The cosine similarity is calculated as follows:
wherein,and->Representing the components of the keyword vectors in the log text vectors a and b, respectively.
Log sets generated for the above-mentioned subsetsIn (2), then, should satisfy
S103, after all cosine similarity is calculated through traversal, each log attribute set corresponding to each log set is constructed.
In the embodiment of the present disclosure, the cloud server performs traversal calculation on each log set to calculate all cosine similarities, and merges all log vector entries with high similarity. And then, the cloud server constructs a log attribute set corresponding to each log set according to each finally obtained log attribute, and realizes direct mapping of the fault type and the log attribute of the low-dimensional space.
S104, selecting any pair of log attribute pairs from the log attribute set according to the log attribute set corresponding to any fault type, calculating the Pearson correlation coefficient of the log attribute pair, and performing redundancy elimination on the target log attribute pairs of which the Pearson correlation coefficient exceeds a first preset range.
In the present embodiment, the multiple events for Bayesian reasoning need to satisfy mutually independent principles, i.e Wherein->Is the probability of event A and event B occurring simultaneously, < >>And->Is the probability that event a and event B occur separately. Therefore, the cloud server also needs to perform correlation test on the log attribute corresponding to the same fault to remove the log attribute of the redundant feature. Specifically, for a log attribute set corresponding to any fault type, the cloud server may store the log attribute in the cloud serverAnd selecting a pair of log attribute pairs in the set, and calculating the pearson correlation coefficient of the log attribute pairs. Wherein, the log attribute pair can be composed of any two log attributes. In addition, a first preset range is stored in the cloud server in advance, if the calculated pearson correlation coefficient is in the first preset range, the calculated pearson correlation coefficient is considered to have no correlation relationship, no additional processing is performed at this time, and the calculated pearson correlation coefficient is reserved. If the calculated pearson correlation coefficient exceeds a first preset range, the calculated pearson correlation coefficient is considered to have a correlation relationship, and at the moment, the cloud server performs redundancy elimination operation on the pearson correlation coefficient and removes a repeated log attribute.
For the obtained log attribute pair, the calculated pearson correlation coefficient can be represented by a correlation coefficient matrix R, and the calculation formula is as follows:
Wherein X and Y are log attributes in the log attribute set, respectively, and each element in the matrix R is a pearson correlation coefficient of a log attribute pair. Furthermore, for a log attribute set where there are multiple subsets, X and Y should be chosen among the intersections of all the subsets.
In an implementation manner, the performing redundancy elimination on the target log attribute pair with the pearson correlation coefficient exceeding the first preset range includes:
and determining a target log attribute with time sequence lag in a target log attribute pair of which the pearson correlation coefficient exceeds a first preset range, and eliminating the target log attribute.
In the embodiment of the present disclosure, in the process of redundancy elimination, the cloud server determines the time sequences of two log attributes in the target log attribute pair, and the log attribute with a lag is eliminated as the target log attribute.
S105, traversing and calculating all the Pearson correlation coefficients to obtain each target log attribute set corresponding to each fault type.
In the embodiment of the specification, similarly, the cloud server can traverse and calculate pearson correlation coefficients of all possible log attribute pairs to finally obtain a target log attribute set based on correlation redundancy elimination optimization, and each log attribute in the target log set has conditional independence, so that the data for subsequent analysis training classifier can be guaranteed to have conditional independence.
S106, constructing a training sample set according to each target log attribute set, training a Bayesian classifier based on the training sample set, and diagnosing fault types corresponding to the abnormal logs obtained in real time based on the trained Bayesian classifier.
In this embodiment of the present disclosure, the cloud server constructs a training sample set according to each target log attribute set, where the training sample set is marked with a fault type, a fault time, and each log attribute related to the fault. Through training of the training sample set, a Bayesian classifier for judging the fault type according to the log attribute can be obtained through training. When a cloud server acquires a newly generated abnormal log in real time, the abnormal log is input into a Bayesian classifier, and the fault type corresponding to the abnormal log is diagnosed and determined according to the output result of the Bayesian classifier, so that the whole process does not need to process data of a high-dimensional level, fault diagnosis can be completed quickly only by paying attention to log attributes of a low-dimensional level, and diagnosis accuracy is high.
The training process of the bayesian classifier can be as follows:
firstly, a cloud server sequentially reads alarm information corresponding to network faults of each network type in a training sample set, and determines log attributes of the faults . Next, the training sample set is classified into network faults according to the manual experience and marked as +.>. Wherein->Is a set of network failure classes, +.>And respectively network fault classes in the network fault class set, namely fault types.
Next, defineWherein d represents a network fault to be classified, < ->Is the n log attributes of the fault d. For each network fault d of the training sample set and the log attribute of the fault +.>Calculating the probability of failure ∈>
For the calculation result of the fault probability, according toDetermining to which of the identified fault classes the network fault d to be classified belongs, i.e. if the class +.>Posterior probability>Greater than category->Posterior probability of (2)Then select the identified fault class +.>As a result of the classification of the fault d to be identified.
For fault alarm information which is not matched, repeating the process, establishing a new fault class, and adding the new fault class to the fault class set T.
After training, the cloud server adds the classified fault class T to a Bayesian classifier so as to process an anomaly log acquired in real time.
In an implementation manner, the diagnosing, based on the trained bayesian classifier, the fault type corresponding to the anomaly log obtained in real time includes:
Processing an anomaly log obtained in real time based on a trained Bayesian classifier to obtain at least one predicted fault type and a fault probability corresponding to the predicted fault type;
and determining the target prediction fault type with the highest fault probability as the fault type corresponding to the abnormal log.
In the embodiment of the present disclosure, the exception log obtained by the cloud server is also input into a bayesian classifier after text vectorization processing, and similar fault log attributes are searched according to cosine similarity. Then, through the calculation of the fault probability identical to the training process, the cloud server can determine the probability distribution of faults possibly occurring in the abnormal log, and then determine the fault type of the abnormal log according to the highest probability.
In an implementation manner, after diagnosing the fault type corresponding to the anomaly log obtained in real time based on the trained bayesian classifier, the method further includes:
and acquiring a fault time prediction matrix corresponding to the fault type, and determining the occurrence time of the predicted fault based on the fault time prediction matrix.
In this embodiment of the present disclosure, in the foregoing training process, for the fault T and the corresponding log attribute x after completing the training, the time difference between the time of occurrence of the fault and the time of obtaining the abnormal log may be calculated according to the corresponding time stamp And stored as a time to failure prediction matrix +.>. The cloud server can infer the fault time according to the fault time prediction matrix after determining the fault type according to the abnormality log obtained in real time, so as to predict the fault occurrence time.
In one embodiment, the method further comprises:
when the abnormal log has abnormal log attributes which cannot be classified by the Bayesian classifier, comparing an actual fault type corresponding to the abnormal log attributes with a fault type set corresponding to the Bayesian classifier;
when a target fault type matched with the actual fault type exists in the fault type set, adding the abnormal log attribute to the fault set of the target fault type, and retraining the Bayesian classifier;
when the target fault type matched with the actual fault type does not exist in the fault type set, a new fault type to be trained is established, the abnormal log attribute is added to the fault set of the fault type to be trained, and the Bayesian classifier is retrained.
In the embodiment of the present specification, errors may occur during the fault diagnosis process on the exception log, that is, by the log attribute Failure to implement fault classification. At this time, the method is divided into two cases, if the actually occurring fault type determined after manual investigation exists in the identified and trained fault classification T, the log attribute ∈ ->For new characterization of the fault, the bayesian classifier should be re-trained for the log attribute to improve accuracy of subsequent diagnosis. If the actually occurring fault type does not exist in the identified and trained fault class T, a new fault class +.>And adds the fault class to the fault set T and repeatsAnd in the training process, retraining the Bayesian classifier.
The following describes in detail a network fault diagnosis device based on a bayesian classifier according to an embodiment of the present application with reference to fig. 2. It should be noted that, the bayesian classifier-based network fault diagnosis apparatus shown in fig. 2 is used to execute the method of the embodiment of fig. 1 of the present application, and for convenience of explanation, only the relevant parts of the embodiment of the present application are shown, and specific technical details are not disclosed, please refer to the embodiment of fig. 1 of the present application.
Referring to fig. 2, fig. 2 is a schematic structural diagram of a network fault diagnosis device based on a bayesian classifier according to an embodiment of the present application. As shown in fig. 2, the apparatus includes:
An obtaining module 201, configured to obtain log text vectors corresponding to a historical network fault, and divide each log text vector into at least one log set based on a fault type;
a first calculation module 202, configured to, for any one of the log sets, select one log vector entry from any two of the log text vectors in the log set, calculate cosine similarity of the two log vector entries, and merge the two log vector entries with the cosine similarity higher than a first preset value to assign the same log attribute;
the first traversal module 203 is configured to construct each log attribute set corresponding to each log set after traversing and calculating all the cosine similarities;
the second calculation module 204 is configured to select, for the log attribute set corresponding to any fault type, any pair of log attribute pairs from the log attribute set, calculate a pearson correlation coefficient of the log attribute pairs, and perform redundancy elimination on a target log attribute pair whose pearson correlation coefficient exceeds a first preset range;
the second traversing module 205 is configured to obtain each target log attribute set corresponding to each fault type after traversing and calculating all pearson correlation coefficients;
The diagnosis module 206 is configured to construct a training sample set according to each target log attribute set, train a bayesian classifier based on the training sample set, and diagnose a fault type corresponding to the abnormal log obtained in real time based on the trained bayesian classifier.
In one embodiment, the obtaining module 201 is specifically configured to:
and acquiring log information corresponding to the historical network faults, and respectively carrying out text vectorization on each log information to obtain each log text vector.
In one embodiment, the obtaining module 201 is specifically further configured to:
and labeling each log text vector based on the fault type, and dividing each log text vector with the same label into a log set to obtain at least one log set.
In one embodiment, the second computing module 204 is specifically configured to:
and determining a target log attribute with time sequence lag in a target log attribute pair of which the pearson correlation coefficient exceeds a first preset range, and eliminating the target log attribute.
In one embodiment, the diagnostic module 206 is specifically configured to:
processing an anomaly log obtained in real time based on a trained Bayesian classifier to obtain at least one predicted fault type and a fault probability corresponding to the predicted fault type;
And determining the target prediction fault type with the highest fault probability as the fault type corresponding to the abnormal log.
In one embodiment, the diagnostic module 206 is specifically further configured to:
and acquiring a fault time prediction matrix corresponding to the fault type, and determining the occurrence time of the predicted fault based on the fault time prediction matrix.
In one embodiment, the diagnostic module 206 is specifically further configured to:
when the abnormal log has abnormal log attributes which cannot be classified by the Bayesian classifier, comparing an actual fault type corresponding to the abnormal log attributes with a fault type set corresponding to the Bayesian classifier;
when a target fault type matched with the actual fault type exists in the fault type set, adding the abnormal log attribute to the fault set of the target fault type, and retraining the Bayesian classifier;
when the target fault type matched with the actual fault type does not exist in the fault type set, a new fault type to be trained is established, the abnormal log attribute is added to the fault set of the fault type to be trained, and the Bayesian classifier is retrained.
It will be clear to those skilled in the art that the technical solutions of the embodiments of the present application may be implemented by means of software and/or hardware. "Unit" and "module" in this specification refer to software and/or hardware capable of performing a specific function, either alone or in combination with other components, such as Field programmable gate arrays (Field-Programmable Gate Array, FPGAs), integrated circuits (Integrated Circuit, ICs), etc.
The processing units and/or modules of the embodiments of the present application may be implemented by an analog circuit that implements the functions described in the embodiments of the present application, or may be implemented by software that executes the functions described in the embodiments of the present application.
Referring to fig. 3, a schematic structural diagram of an electronic device according to an embodiment of the present application is shown, where the electronic device may be used to implement the method in the embodiment shown in fig. 1. As shown in fig. 3, the electronic device 300 may include: at least one central processor 301, at least one network interface 304, a user interface 303, a memory 305, at least one communication bus 302.
Wherein the communication bus 302 is used to enable connected communication between these components.
The user interface 303 may include a Display screen (Display), a Camera (Camera), and the optional user interface 303 may further include a standard wired interface, and a wireless interface.
The network interface 304 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others.
Wherein the central processor 301 may comprise one or more processing cores. The central processor 301 connects the various parts within the overall electronic device 300 using various interfaces and lines, performs various functions of the terminal 300 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 305, and invoking data stored in the memory 305. Alternatively, the central processor 301 may be implemented in at least one hardware form of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA). The central processor 301 may integrate one or a combination of several of a central processor (Central Processing Unit, CPU), an image central processor (Graphics Processing Unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the cpu 301 and may be implemented by a single chip.
The memory 305 may include a random access memory (Random Access Memory, RAM) or a Read-only memory (Read-only memory). Optionally, the memory 305 includes a non-transitory computer readable medium (non-transitory computer-readable storage medium). Memory 305 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 305 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the above-described respective method embodiments, etc.; the storage data area may store data or the like referred to in the above respective method embodiments. The memory 305 may also optionally be at least one storage device located remotely from the aforementioned central processor 301. As shown in fig. 3, an operating system, a network communication module, a user interface module, and program instructions may be included in the memory 305, which is a type of computer storage medium.
In the electronic device 300 shown in fig. 3, the user interface 303 is mainly used for providing an input interface for a user, and acquiring data input by the user; and the central processor 301 may be configured to invoke the bayesian classifier-based network fault diagnosis application stored in the memory 305, and specifically perform the following operations:
Acquiring log text vectors corresponding to historical network faults, and dividing each log text vector into at least one log set based on fault types;
for any one of the log sets, selecting one log vector entry from any two log text vectors in the log set, calculating cosine similarity of the two log vector entries, and merging the two log vector entries with the cosine similarity higher than a first preset value to endow the same log attribute;
after all cosine similarity is calculated through traversal, constructing each log attribute set corresponding to each log set;
selecting any pair of log attribute pairs from the log attribute set aiming at the log attribute set corresponding to any fault type, calculating the pearson correlation coefficient of the log attribute pair, and performing redundancy elimination on target log attribute pairs of which the pearson correlation coefficient exceeds a first preset range;
after traversing and calculating all the pearson correlation coefficients, obtaining each target log attribute set corresponding to each fault type;
and constructing a training sample set according to each target log attribute set, training a Bayesian classifier based on the training sample set, and diagnosing the fault type corresponding to the abnormal log obtained in real time based on the trained Bayesian classifier.
The present application also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the above method. The computer readable storage medium may include, among other things, any type of disk including floppy disks, optical disks, DVDs, CD-ROMs, micro-drives, and magneto-optical disks, ROM, RAM, EPROM, EEPROM, DRAM, VRAM, flash memory devices, magnetic or optical cards, nanosystems (including molecular memory ICs), or any type of media or device suitable for storing instructions and/or data.
It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, such as the division of the units, merely a logical function division, and there may be additional manners of dividing the actual implementation, such as multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some service interface, device or unit indirect coupling or communication connection, electrical or otherwise.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable memory. Based on this understanding, the technical solution of the present application may be embodied essentially or partly in the form of a software product, or all or part of the technical solution, which is stored in a memory, and includes several instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned memory includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Those of ordinary skill in the art will appreciate that all or a portion of the steps in the various methods of the above embodiments may be performed by hardware associated with a program that is stored in a computer readable memory, which may include: flash disk, read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), magnetic or optical disk, and the like.
The foregoing is merely exemplary embodiments of the present disclosure and is not intended to limit the scope of the present disclosure. That is, equivalent changes and modifications are contemplated by the teachings of this disclosure, which fall within the scope of the present disclosure. Embodiments of the present disclosure will be readily apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a scope and spirit of the disclosure being indicated by the claims.

Claims (10)

1. A bayesian classifier-based network fault diagnosis method, the method comprising:
acquiring log text vectors corresponding to historical network faults, and dividing each log text vector into at least one log set based on fault types;
for any one of the log sets, selecting one log vector entry from any two log text vectors in the log set, calculating cosine similarity of the two log vector entries, and merging the two log vector entries with the cosine similarity higher than a first preset value to endow the same log attribute;
After all cosine similarity is calculated through traversal, constructing each log attribute set corresponding to each log set;
selecting any pair of log attribute pairs from the log attribute set aiming at the log attribute set corresponding to any fault type, calculating the pearson correlation coefficient of the log attribute pair, and performing redundancy elimination on target log attribute pairs of which the pearson correlation coefficient exceeds a first preset range;
after traversing and calculating all the pearson correlation coefficients, obtaining each target log attribute set corresponding to each fault type;
and constructing a training sample set according to each target log attribute set, training a Bayesian classifier based on the training sample set, and diagnosing the fault type corresponding to the abnormal log obtained in real time based on the trained Bayesian classifier.
2. The method of claim 1, wherein the obtaining a log text vector corresponding to the historical network failure comprises:
and acquiring log information corresponding to the historical network faults, and respectively carrying out text vectorization on each log information to obtain each log text vector.
3. The method of claim 1, wherein the dividing each of the log text vectors into at least one log set based on fault type comprises:
And labeling each log text vector based on the fault type, and dividing each log text vector with the same label into a log set to obtain at least one log set.
4. The method of claim 1, wherein the redundant culling of the target log attribute pairs for which the pearson correlation coefficient exceeds a first preset range comprises:
and determining a target log attribute with time sequence lag in a target log attribute pair of which the pearson correlation coefficient exceeds a first preset range, and eliminating the target log attribute.
5. The method according to claim 1, wherein diagnosing the fault type corresponding to the anomaly log obtained in real time based on the trained bayesian classifier comprises:
processing an anomaly log obtained in real time based on a trained Bayesian classifier to obtain at least one predicted fault type and a fault probability corresponding to the predicted fault type;
and determining the target prediction fault type with the highest fault probability as the fault type corresponding to the abnormal log.
6. The method according to claim 1, wherein after diagnosing the fault type corresponding to the anomaly log obtained in real time based on the trained bayesian classifier, the method further comprises:
And acquiring a fault time prediction matrix corresponding to the fault type, and determining the occurrence time of the predicted fault based on the fault time prediction matrix.
7. The method according to claim 1, wherein the method further comprises:
when the abnormal log has abnormal log attributes which cannot be classified by the Bayesian classifier, comparing an actual fault type corresponding to the abnormal log attributes with a fault type set corresponding to the Bayesian classifier;
when a target fault type matched with the actual fault type exists in the fault type set, adding the abnormal log attribute to the fault set of the target fault type, and retraining the Bayesian classifier;
when the target fault type matched with the actual fault type does not exist in the fault type set, a new fault type to be trained is established, the abnormal log attribute is added to the fault set of the fault type to be trained, and the Bayesian classifier is retrained.
8. A bayesian classifier-based network fault diagnosis apparatus, the apparatus comprising:
the acquisition module is used for acquiring log text vectors corresponding to the historical network faults and dividing each log text vector into at least one log set based on the fault type;
The first calculation module is used for selecting one log vector item from any two log text vectors in the log set respectively, calculating cosine similarity of the two log vector items, and merging the two log vector items with the cosine similarity higher than a first preset value to endow the same log attribute;
the first traversing module is used for constructing each log attribute set corresponding to each log set after traversing and calculating all the cosine similarity;
the second calculation module is used for selecting any pair of log attribute pairs from the log attribute set according to the log attribute set corresponding to any fault type, calculating the pearson correlation coefficient of the log attribute pairs, and performing redundancy elimination on target log attribute pairs of which the pearson correlation coefficient exceeds a first preset range;
the second traversing module is used for traversing and calculating all the pearson correlation coefficients to obtain each target log attribute set corresponding to each fault type;
the diagnosis module is used for constructing a training sample set according to each target log attribute set, training a Bayesian classifier based on the training sample set, and diagnosing fault types corresponding to the abnormal logs acquired in real time based on the trained Bayesian classifier.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1-7 when the computer program is executed.
10. A computer readable storage medium having stored thereon a computer program having instructions stored therein, which when run on a computer or processor, cause the computer or processor to perform the steps of the method according to any of claims 1-7.
CN202311494191.7A 2023-11-10 2023-11-10 Network fault diagnosis method and device based on Bayesian classifier Active CN117240700B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311494191.7A CN117240700B (en) 2023-11-10 2023-11-10 Network fault diagnosis method and device based on Bayesian classifier

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311494191.7A CN117240700B (en) 2023-11-10 2023-11-10 Network fault diagnosis method and device based on Bayesian classifier

Publications (2)

Publication Number Publication Date
CN117240700A true CN117240700A (en) 2023-12-15
CN117240700B CN117240700B (en) 2024-02-06

Family

ID=89095124

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311494191.7A Active CN117240700B (en) 2023-11-10 2023-11-10 Network fault diagnosis method and device based on Bayesian classifier

Country Status (1)

Country Link
CN (1) CN117240700B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107423205A (en) * 2017-07-11 2017-12-01 北京明朝万达科技股份有限公司 A kind of system failure method for early warning and system for anti-data-leakage system
CN110647446A (en) * 2018-06-26 2020-01-03 中兴通讯股份有限公司 Log fault association and prediction method, device, equipment and storage medium
CN111198817A (en) * 2019-12-30 2020-05-26 武汉大学 SaaS software fault diagnosis method and device based on convolutional neural network
CN112100025A (en) * 2020-08-25 2020-12-18 北京明略昭辉科技有限公司 Log simplifying method and device, electronic equipment and computer readable medium
WO2021139279A1 (en) * 2020-07-30 2021-07-15 平安科技(深圳)有限公司 Data processing method and apparatus based on classification model, and electronic device and medium
US20210240691A1 (en) * 2020-01-30 2021-08-05 International Business Machines Corporation Anomaly identification in log files
CN113946971A (en) * 2021-10-25 2022-01-18 国网天津市电力公司电力科学研究院 Dimension reduction method for massive input/output log data stream of power system
WO2022134911A1 (en) * 2020-12-21 2022-06-30 中兴通讯股份有限公司 Diagnosis method and apparatus, and terminal and storage medium
CN115061874A (en) * 2022-06-14 2022-09-16 中国工商银行股份有限公司 Log information verification method, device, equipment and medium
CN115952096A (en) * 2022-12-30 2023-04-11 华润数字科技有限公司 Fault detection method, device, equipment and medium of data center software system
CN116910013A (en) * 2023-07-17 2023-10-20 西安电子科技大学 System log anomaly detection method based on semantic flowsheet mining

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107423205A (en) * 2017-07-11 2017-12-01 北京明朝万达科技股份有限公司 A kind of system failure method for early warning and system for anti-data-leakage system
CN110647446A (en) * 2018-06-26 2020-01-03 中兴通讯股份有限公司 Log fault association and prediction method, device, equipment and storage medium
CN111198817A (en) * 2019-12-30 2020-05-26 武汉大学 SaaS software fault diagnosis method and device based on convolutional neural network
US20210240691A1 (en) * 2020-01-30 2021-08-05 International Business Machines Corporation Anomaly identification in log files
WO2021139279A1 (en) * 2020-07-30 2021-07-15 平安科技(深圳)有限公司 Data processing method and apparatus based on classification model, and electronic device and medium
CN112100025A (en) * 2020-08-25 2020-12-18 北京明略昭辉科技有限公司 Log simplifying method and device, electronic equipment and computer readable medium
WO2022134911A1 (en) * 2020-12-21 2022-06-30 中兴通讯股份有限公司 Diagnosis method and apparatus, and terminal and storage medium
CN113946971A (en) * 2021-10-25 2022-01-18 国网天津市电力公司电力科学研究院 Dimension reduction method for massive input/output log data stream of power system
CN115061874A (en) * 2022-06-14 2022-09-16 中国工商银行股份有限公司 Log information verification method, device, equipment and medium
CN115952096A (en) * 2022-12-30 2023-04-11 华润数字科技有限公司 Fault detection method, device, equipment and medium of data center software system
CN116910013A (en) * 2023-07-17 2023-10-20 西安电子科技大学 System log anomaly detection method based on semantic flowsheet mining

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
华顺刚;李春泽;: "基于深度学习方法的三维模型相似度计算", 机电工程技术, no. 09 *
孙玫;张森;聂培尧;聂秀山;: "基于朴素贝叶斯的网络查询日志session划分方法研究", 南京大学学报(自然科学), no. 06 *
邹根;闻立杰;: "基于支持向量机的Web日志用户标志修正算法", 计算机集成制造***, no. 08 *

Also Published As

Publication number Publication date
CN117240700B (en) 2024-02-06

Similar Documents

Publication Publication Date Title
US11586972B2 (en) Tool-specific alerting rules based on abnormal and normal patterns obtained from history logs
Chen et al. How incidental are the incidents? characterizing and prioritizing incidents for large-scale online service systems
JP6085550B2 (en) Log analysis apparatus and method
CN104796273A (en) Method and device for diagnosing root of network faults
US8918345B2 (en) Network analysis system
CN113098723A (en) Fault root cause positioning method and device, storage medium and equipment
CN111078513B (en) Log processing method, device, equipment, storage medium and log alarm system
CN111858254B (en) Data processing method, device, computing equipment and medium
US11416321B2 (en) Component failure prediction
US20150046757A1 (en) Performance Metrics of a Computer System
US10635521B2 (en) Conversational problem determination based on bipartite graph
CN113515434A (en) Abnormity classification method, abnormity classification device, abnormity classification equipment and storage medium
US11792081B2 (en) Managing telecommunication network event data
CN117312098B (en) Log abnormity alarm method and device
CN112769615B (en) Anomaly analysis method and device
CN117240700B (en) Network fault diagnosis method and device based on Bayesian classifier
WO2021011065A1 (en) Time-series data condensation and graphical signature analysis
CN110389875A (en) Method, apparatus and storage medium for supervisory computer system operating status
WO2022000285A1 (en) Health index of a service
CN113572628B (en) Data association method, device, computing equipment and computer storage medium
CN113282751A (en) Log classification method and device
CN115372752A (en) Fault detection method, device, electronic equipment and storage medium
CN111581044A (en) Cluster optimization method, device, server and medium
CN114124509B (en) Spark-based network abnormal flow detection method and system
CN114756401B (en) Abnormal node detection method, device, equipment and medium based on log

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant