CN115277180A - Block chain log anomaly detection and tracing system - Google Patents

Block chain log anomaly detection and tracing system Download PDF

Info

Publication number
CN115277180A
CN115277180A CN202210882913.5A CN202210882913A CN115277180A CN 115277180 A CN115277180 A CN 115277180A CN 202210882913 A CN202210882913 A CN 202210882913A CN 115277180 A CN115277180 A CN 115277180A
Authority
CN
China
Prior art keywords
log
model
template
sequence
time sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210882913.5A
Other languages
Chinese (zh)
Other versions
CN115277180B (en
Inventor
牛伟纳
张小松
廖旭涵
赵丽睿
周孝笑
朱宇坤
张然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202210882913.5A priority Critical patent/CN115277180B/en
Publication of CN115277180A publication Critical patent/CN115277180A/en
Application granted granted Critical
Publication of CN115277180B publication Critical patent/CN115277180B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/069Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/142Network analysis or design using statistical or mathematical methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2463/00Additional details relating to network architectures or network communication protocols for network security covered by H04L63/00
    • H04L2463/146Tracing the source of attacks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Algebra (AREA)
  • Computer Hardware Design (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The present invention relates to the field of blockchain applications. The system aims to solve the problem of data anomaly detection function lacking in the current blockchain architecture and can safely, accurately and reliably realize data detection. Extracting a template from the data log, and counting the number characteristics; training a model through feature representation of a log, wherein the features are divided into quantity features and time sequence features; for the log sequence to be detected, processing data through a data processing module, combining a model trained by a quantity time sequence model training module after the data processing, outputting a numerical value of 0-1 by the model, recording the numerical value as the time sequence model deviation and the quantity deviation respectively, and then comprehensively calculating the final deviation; and writing the logs exceeding the deviation threshold into a table, giving a threat mark, giving a log sequence to which the threat log belongs as tracing output, and if abnormal false alarm is found during auditing, marking the log sequence as false alarm to allow a system to dynamically adjust the threshold, so that the accuracy is increased.

Description

Block chain log anomaly detection and traceability system
Technical Field
The invention belongs to the field of block chain data security, and provides a block chain log anomaly detection and tracing system.
Background
The block chain technology is one of the most popular technologies today, and has been widely used in multiple scenarios such as finance and supply chain. The block chain technology can be generally divided into three implementation forms of public chain, alliance chain and private chain. In the initial stage of block chain application, public chain is used as the main expression form, all people can participate in supervision, and the authenticity of uplink information is the strongest. But the number of the participators is too large, which causes the running efficiency to be low. When the block chain is used in a small scale in an enterprise, the block chain is realized by selecting the private chain, the number of people participating in the private chain is small, but the centralization degree is too high, and the block chain can only be operated in the industry of a single center generally. The alliance chain combining the advantages of the two is a blockchain form which is selected by most applications at present, the alliance chain is jointly supervised by a plurality of main participation parts, each part can independently control individuals which want to authorize to participate in a blockchain network, and the individuals are registered to participate in supervision as a sub part of the part. The information on the link is transparent to all the participating individuals, and operations such as data addition are supervised by groups, so that the link has traceability and non-tamper property. At present, the application targets of block chains used at home and abroad mainly comprise the guarantee of non-falsification, credibility, integrity and traceability of auditable data.
The log is the most representative auditable data, the log is used for recording operation information such as various parameters and the like during the operation of the system, and a system developer can find problems and locate the problems to solve the problems in time through the audit log regularly or when abnormal behaviors occur. There are problems with existing logging systems. If the system is attacked manually, the log records can be tampered by a prepared attacker, so that a developer cannot locate errors through false log records, and the difficulty of repairing the system and locating problems by the developer is increased. In addition, a widely used log anomaly detection method is generally used for detecting anomalies by developers according to their domain knowledge, combining log anomaly levels, utilizing keyword search, regular expressions and the like. However, this approach relies heavily on manual work, which is more difficult as the system gets larger and more complex.
The log is a time sequence text data, which is composed of a timestamp and a text message, records the running state of the service in real time, has a certain number of corresponding relations, for example, if several files are opened, several files should be closed, and if the log execution sequence has a problem or the corresponding relation is incorrect, abnormality may be caused. However, the current logs are not uniform in specification, the formats of the logs printed by different types of equipment are different, the log data also has the characteristic of unstructured performance, the logs are difficult to be processed in batch and automatically, and the problems make log analysis very difficult.
The log analyzer based on the fixed depth tree preprocesses original log information through a simple regular expression set by domain knowledge, then searches a log group according to a special design rule coded in a node inside the tree, if a matched log group is found, the log information can be matched with log events stored in the log group, if the matched log group is not found, a new log group is created, finally all logs can be attributed to the log group, the log analyzer is equivalent to a classified log, and a common mode of extracting the logs in the same form is used as a template.
Disclosure of Invention
The invention discloses a block chain automatic log anomaly detection and tracing scheme based on a federation chain. In conventional blockchain applications, automated anomaly detection is not performed on data on the federation chain, but only by hand relying on experience. With the increasing variety and complexity of data on the federation chain, relying on manual detection alone is not feasible. Therefore, it is necessary to research a reasonable and high-accuracy log anomaly detection technology to increase the automation capability of the system. The log anomaly detection and tracing scheme solves the problem of data anomaly detection function lacking in the current block chain architecture, and can safely, highly accurately and reliably realize data detection.
In order to solve the technical problems, the invention adopts the following technical scheme:
a blockchain log anomaly detection and tracing system, comprising:
a data processing module: extracting a template from the data log, wherein the log template comprises a quantitative part and a variable, structuring unstructured log data into a template log which is easy to analyze, counting quantity characteristics according to the template, and counting the quantity characteristics, namely the number of times of word occurrence and the number of times of combined word occurrence in the template;
a quantity time sequence model training module: training a model through the characteristic representation of the log, wherein the characteristics are divided into quantity characteristics and time sequence characteristics, the quantity characteristics and the time sequence characteristics are respectively input into the time sequence and the quantity model for training, the quantity characteristics refer to the times of the words appearing in the template and the times of the combined words appearing, and the time sequence characteristics refer to the sequence of the log;
a deviation calculation module: for a log sequence to be detected, processing data through a data processing module, combining a model trained by a quantity time sequence model training module after the data processing, outputting a numerical value of 0-1 by the model, respectively recording as a time sequence model deviation degree and a quantity deviation degree, and then comprehensively calculating a final deviation degree;
an exception tracing module: and writing the logs exceeding the deviation threshold into a table, giving a threat mark, giving a log sequence to which the threat log belongs as tracing output, and if abnormal false alarm is found during auditing, marking the log sequence as false alarm to allow a system to dynamically adjust the threshold, so that the accuracy is increased.
In the above technical solution, the data processing module outputs the statistical characteristics by using a drain log template extractor in combination with the multidimensional characteristic combination, and specifically includes:
1) A drain log template extractor extracts a template from the existing log of the block chain network;
2) Respectively counting the occurrence times of words and the occurrence times of combined words in the template for the template extracted by the Drain;
3) When the node uploads the log in the block chain network, classifying the log into a corresponding template and a statistical quantity characteristic by using a drain:
in the above technical solution, in the quantity timing model training module:
respectively acquiring the time sequence characteristics and the quantity characteristics of the log sequence;
putting the time sequence characteristics into a GRU model based on an attention mechanism for training to obtain a time sequence model;
putting the quantity characteristics into a decision tree based on gradient lifting for training to obtain a quantity model;
and storing the time sequence model and the quantity model with the highest precision in the training process. 4. The system of claim 3, wherein the GRU model based on attention mechanism comprises the following steps:
A. the log is text data, the extracted template is also a text template, semantic conversion is needed before the text is input into the model, and the log template text is converted into a log template vector by adopting a semantic vector trained by glove for the input log template text; the log is a word, the program cannot process, the vector is a number, and the program can process. The glove vocabulary is a one-to-one correspondence relationship between characters and numbers, and the characters can be converted into the numbers by looking up the table.
B. Converting the batch log template vector into a log template sequence vector by adopting a sliding window mode;
C. inputting the log template sequence vector into a model, and enabling the model to learn the time sequence characteristics;
D. and storing the training result to obtain a time sequence model.
In the technical scheme, the exception tracing module,
1) Setting a threshold value, judging the deviation value, and marking the deviation value as abnormal (marked as 1) if the deviation value exceeds the threshold value;
2) The data with false alarm can be marked as false alarm (marked as 0), whether the threshold value is adjusted or not is judged according to whether the abnormal quantity below the deviation value of the false alarm in a certain time is particularly small, and if the abnormal quantity is particularly small, the threshold value is low and needs to be improved;
3) After being processed by the data processing module, the log data to be detected is input into a trained model in the quantity timing sequence model training module to obtain a deviation value;
4) Judging whether the mark is abnormal or not according to the threshold value;
5) If the log is abnormal, tracing and outputting a log sequence related to the abnormality.
In the technical scheme, the tracing outputs an abnormal process,
1) Caching the one-to-one corresponding relation between the original text and the vector of the log to be tested in a memory;
2) If the log vector is marked as abnormal, original log text information of the log vector and related log sequence information are obtained through table lookup;
3) Otherwise, emptying the cache and detecting the next batch of logs.
Due to the adoption of the technical scheme, the invention has the following beneficial effects:
1. the log data is stored in the alliance chain, and the credibility and reliability of the log data audit can be guaranteed.
2. The log abnormity is automatically detected by combining the advantages of deep learning and machine learning and utilizing the time sequence characteristics and the numerical characteristics of the log.
3. Through a comprehensive deviation calculation method of attention mechanism in time sequence characteristics, quantity characteristic screening and final weight distribution, irrelevant characteristics and influences are effectively removed, and the accuracy of results is improved.
Drawings
FIG. 1 is a basic flow of block chain log anomaly detection and tracing;
FIG. 2 is a process of log anomaly detection;
FIG. 3 is a block chain log anomaly detection and tracing system architecture
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the detailed description and specific examples, while indicating the preferred embodiment of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.
The detailed description of the embodiments of the present invention is not intended to limit the scope of the invention as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
It is noted that relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.
The implementation of the block chain log anomaly detection and tracing system comprises the following steps:
a data processing module: and extracting a log template from the unstructured log, and counting the occurrence times of fixed words and the combination of the fixed words in the template.
A quantity time sequence model training module: and converting the log sequence into a digital vector, and inputting the digital vector into a timing model for training. And simultaneously, inputting the quantity characteristics counted by the data processing module into a quantity model for training, and storing a training result.
A deviation calculation module: and inputting the log sequence to be tested into the trained result in the quantity time sequence model training module, respectively calculating the time sequence deviation degree and the quantity deviation degree, and then calculating the comprehensive deviation degree.
An exception tracing module: whether the deviation degree is abnormal or not is marked according to the value of the deviation degree, if the deviation degree is abnormal, a related log sequence is output by tracing, and an operator can dynamically adjust the deviation threshold value through feedback.
The main processes of the scheme for the four modules comprise:
A. the data processing module extracts log templates from a large amount of unstructured log data existing on a block chain network, and counts the occurrence frequency of fixed words and the occurrence frequency of fixed word combinations corresponding to each template.
B. Converting the log text sequence into a digital vector through a Glove vocabulary, temporarily caching the digital vector, inputting the sequence containing time sequence information into a time sequence model, inputting the number characteristics of template statistics corresponding to the sequences into a number model for respective training, and storing the training result with the highest accuracy.
C. And processing the log sequence to be tested through a data processing module, inputting the log sequence to the trained model in the quantitative timing model, respectively calculating the timing deviation t and the numerical deviation n, performing weight distribution on the final deviation influence through the timing deviation and the numerical deviation, and calculating the comprehensive deviation y = w1t + w2n.
D. And determining whether the comprehensive deviation value y is marked as abnormal according to whether the comprehensive deviation value y is larger than a set threshold value m, if the comprehensive deviation value y is abnormal, outputting an abnormal log sequence through log associated information cached in the memory, and if the comprehensive deviation value y is not abnormal, clearing the cache.
Further, in the process of processing the transaction in step B, firstly, the log template is parsed, and we use a drain log parser. The method comprises the steps of replacing conventional variable information with masks, classifying and aggregating according to the length and prefix similarity of a log, and finally obtaining a log template.
In particular, in our system, we combine both the timing and quantitative features of the log sequence. The time sequence characteristic is the sequence of log execution, for example, after a file is newly created, the file is written and then deleted, and the sequence is sequential. And the quantity characteristic is that the file is opened for several times and then closed for several times, and the relation of the quantities is corresponding.
On the basis, a classic and high-accuracy deep learning algorithm and a machine learning method are used:
1) In the process of time sequence training, a GRU (Gated Current Unit) algorithm based on an attention mechanism is used, a specific sequence can be gathered according to the function of the sequence on an input time sequence, irrelevant sequence noise is ignored, and a good time sequence model is obtained through training.
2) In the quantity training process, a gradient lifting decision tree method is used for screening the one-dimensional characteristics and the two-dimensional characteristics of the quantity, irrelevant characteristics are removed, and finally a quantity model with effective respective abnormality is obtained.
In step D, the threshold value is initially set to a lower value manually, so that as much abnormal information as possible can be included. The algorithm for dynamically adjusting the threshold value through personnel feedback is to observe the false alarm rate, namely whether the false alarm quantity greatly exceeds the abnormal quantity under the conditions that the false alarm deviation value is less than or equal to the false alarm deviation value and the false alarm deviation value is greater than or equal to the threshold value, and if the false alarm deviation value exceeds the abnormal quantity, the threshold value is updated to be the false alarm deviation value.
The technical scheme of the invention is further explained as follows:
1. extraction of log template
The log analysis method aims to solve the problems that the current log specifications are not uniform, the formats of logs printed by different types of equipment are different, and log data are unstructured, so that the logs are convenient for personnel to analyze. And a log analyzer is used for extracting the logs into a log template, so that the types of the processed logs are clearer, and the processing difficulty is reduced. The main steps of the log parser include the following five steps.
1. Pretreatment: masking substitution is performed on the apparent portion using a regular expression.
2. Log length classification: the log is sorted according to the number of tokens in the original log.
3. And log classification: the logs are classified according to the preset log depth, which is generally set to 4, and can be finely adjusted according to the actual scene, and the depth can influence the number and the precision of nodes traversed by searching.
4. Log classification: after classification, according to similarity algorithm
Figure BDA0003764798480000061
Wherein seq1 (i) represents the ith letter of log sequence 1, seq2 (i) represents the ith letter of log sequence 2,
Figure BDA0003764798480000062
Figure BDA0003764798480000063
and judging whether the sequences belong to the same class, if not, adding new classes, wherein t1 and t2 refer to letters corresponding to the same positions of the two sequences, and n is the length of a longer sequence in sequence comparison.
2. Period of log data presence:
the purpose of log anomaly detection is to trace the source of an abnormal log, locate threats and check in time. But the logs are converted into digital vectors in the process of anomaly detection, the readability is not available, and because the number of processed logs is huge, all log sequence relations cannot be stored, the existence of how to set the period and how to select the cached content is of great significance for the anomaly detection of the logs. The specific caching process is as follows:
step one, marking the original log with a sequence number in a data processing module, establishing a cache table, wherein table items are content log sequence numbers-log templates, and the sequence numbers are used for replacing the original log in the subsequent intermediate process, so that the utilization rate of time and space is improved.
And step two, in the numerical timing model training module, converting the original log text through a glove vocabulary table to obtain a digital vector, and establishing a cache table for the log serial number and the digital vector, wherein the table entry is the log serial number-digital vector.
And step three, calculating the deviation degree of the log to be detected in a deviation degree calculation module, and establishing a corresponding cache table item as a log serial number-comprehensive deviation degree.
And step four, in the abnormal tracing module, if the deviation exceeds a threshold value, returning a log sequence according to the log sequence number and the window size initially set by the system, and clearing the cache table.
3. Format for log data presence
In the log exception handling process, four basic data formats are mainly adopted:
1) Original log data: receiving block blk-354458 src: /10.250.19.102:39325dest: /10.250.19.102:50010.
2) A log template: receving block [ ID ] src: [ ] dest: /[ IPANDPORT ].
3) glove vocabulary: receiving: [ 300-dimensional digital vector ], block: [ 300-dimensional digital vector ], src: [ 300-dimensional digital vector ], dest: the [ 300-dimensional digital vector ], the vector corresponding to a line of the log is formed by adding each word.
4) Quantitative characterization: receiving:1, block:1, src:1, dest:1, receiving-block: 1, receiving-src:1 …
The method is expressed as a vector [1,1,1, … ], after standardization, n is calculated to be the sum of the occurrence times, and finally expressed as a vector [1/n,1/n,1/n, … ], wherein Receiving-block is the combination of words Receiving and block, and Receiving-src is the combination of words Receiving and src.
Examples
The specific data execution process is as follows:
step one, for an original log Receiving block blk _ -354458src: /10.250.19.102:39325dest: the data processing module is input with/10.250.19.102: 50010 for classification processing to obtain a log template Receiving block [ ID ] src: [ ] dest: /[ IPANDPORT ], and statistical quantitative characterization Receiving:1, block:1, src:1, dest:1, receiving-block: 1, receiving-src:1 …, normalized to obtain vector [1/n,1/n,1/n,1/n,1/n,1/n, … ].
Step two, recording the original log recording block blk _ -354458src: /10.250.19.102:39325dest: /10.250.19.102:50010 converting into 300-dimensional digital vector through table look-up (glove vocabulary) conversion to obtain log sequence. And (3) inputting the log sequence into a time sequence model, inputting the digital vector obtained in the step one into a quantity model, respectively obtaining a time sequence deviation degree (0-1) and a quantity deviation degree (0-1), and then calculating to obtain a comprehensive deviation degree (0-1).
In the training stage, the model trains a very complex function through the input log template vector and corresponding labels, such as a normal log label of 0 and an abnormal label of 1. After the training is finished, the log template vector is input, namely the trained function is used for obtaining an output result, the value of the output result is 0-1, the closer to 0, the more possible the output result is to be normal, and the closer to one, the more possible the output result is to be abnormal.
And step three, judging whether the comprehensive deviation degree reaches a threshold value, if so, marking the log as abnormal, and outputting other original logs related to the input log case.

Claims (6)

1. A block chain log anomaly detection and tracing system is characterized in that:
a data processing module: extracting a template from the data log, wherein the log template comprises a quantitative part and a variable, structuring unstructured log data into a template log which is easy to analyze, and counting quantity characteristics according to the template, wherein the quantity characteristics are the number of times of word occurrence and the number of times of combined word occurrence in the template;
a quantity time sequence model training module: training a model through feature representation of a log, wherein the features are divided into quantity features and time sequence features, the quantity features and the time sequence features are respectively input into a time sequence model and a quantity model for training, the quantity features refer to the times of word occurrence and the times of combined word occurrence in a template, and the time sequence features refer to the sequence of the log;
a deviation calculation module: for a log sequence to be detected, processing data through a data processing module, combining a model trained by a quantity time sequence model training module after the data processing, outputting a numerical value of 0-1 by the model, respectively recording as a time sequence model deviation degree and a quantity deviation degree, and then comprehensively calculating a final deviation degree;
an exception tracing module: and writing the logs exceeding the deviation threshold into a table, giving a threat mark, giving a log sequence to which the threat log belongs as tracing output, and if abnormal false alarm is found during auditing, marking the log sequence as false alarm to allow a system to dynamically adjust the threshold, so that the accuracy is increased.
2. The system of claim 1, wherein the system comprises: the data processing module adopts a drain log template extractor and combines with multidimensional feature combination to output statistical features, and the method specifically comprises the following steps:
1) A drain log template extractor extracts a template from the existing log of the blockchain network;
2) Respectively counting the occurrence times of words and the occurrence times of combined words in the template extracted by drain;
3) When the node uploads the log in the block chain network, the log is classified into a corresponding template and the quantity characteristic by using drain for statistics.
3. The system of claim 1, wherein the quantity timing model training module comprises:
respectively acquiring the time sequence characteristics and the quantity characteristics of the log sequence;
putting the time sequence characteristics into a GRU model based on an attention mechanism for training to obtain a time sequence model;
putting the quantity characteristics into a gradient-based lifting decision tree for training to obtain a quantity model;
and storing the time sequence model and the quantity model with the highest precision in the training process.
4. The system of claim 3, wherein the GRU model based on attention mechanism comprises the following steps:
A. the log is text data, the extracted template is also a text template, semantic conversion is needed before the text is input into the model, and the log template text is converted into a log template vector by adopting a semantic vector trained by glove for the input log template text;
B. converting the batch log template vector into a log template sequence vector by adopting a sliding window mode;
C. inputting the log template sequence vector into a model, and enabling the model to learn the time sequence characteristics;
D. and storing the training result to obtain a time sequence model.
5. The system of claim 1, wherein the system comprises: an exception source tracing module is used for tracing the exception,
1) Setting a threshold value, judging the deviation value, and marking the deviation value as abnormal if the deviation value exceeds the threshold value;
2) The data with false alarm can be marked as false alarm, whether the threshold value is adjusted or not is judged according to whether the abnormal quantity below the deviation value of the false alarm in a certain time is particularly small, and if the particularly small value represents that the threshold value is lower, the threshold value needs to be improved;
3) After being processed by the data processing module, the log data to be detected is input into a trained model in the quantity timing sequence model training module to obtain a deviation value;
4) Judging whether the mark is abnormal or not according to a threshold value;
5) If the log sequence is abnormal, the log sequence related to the abnormality is output by tracing.
6. The system of claim 5, wherein the system comprises: the process of tracing to the source and outputting an exception,
1) Caching the one-to-one corresponding relation between the original text and the vector of the log to be tested in a memory;
2) If the log vector is marked as abnormal, original log text information of the log vector and related log sequence information are obtained through table lookup;
3) Otherwise, emptying the cache and detecting the next batch of logs.
CN202210882913.5A 2022-07-26 2022-07-26 Block chain log anomaly detection and tracing system Active CN115277180B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210882913.5A CN115277180B (en) 2022-07-26 2022-07-26 Block chain log anomaly detection and tracing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210882913.5A CN115277180B (en) 2022-07-26 2022-07-26 Block chain log anomaly detection and tracing system

Publications (2)

Publication Number Publication Date
CN115277180A true CN115277180A (en) 2022-11-01
CN115277180B CN115277180B (en) 2023-04-28

Family

ID=83768725

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210882913.5A Active CN115277180B (en) 2022-07-26 2022-07-26 Block chain log anomaly detection and tracing system

Country Status (1)

Country Link
CN (1) CN115277180B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115794465A (en) * 2022-11-10 2023-03-14 上海鼎茂信息技术有限公司 Method and system for detecting log abnormity
CN116074092A (en) * 2023-02-07 2023-05-05 电子科技大学 Attack scene reconstruction system based on heterogram attention network
CN116405326A (en) * 2023-06-07 2023-07-07 厦门瞳景智能科技有限公司 Information security management method and system based on block chain

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111209168A (en) * 2020-01-14 2020-05-29 中国人民解放军陆军炮兵防空兵学院郑州校区 Log sequence anomaly detection framework based on nLSTM-self attention
CN111930903A (en) * 2020-06-30 2020-11-13 山东师范大学 System anomaly detection method and system based on deep log sequence analysis
CN113434357A (en) * 2021-05-17 2021-09-24 中国科学院信息工程研究所 Log abnormity detection method and device based on sequence prediction
US20210349989A1 (en) * 2020-12-17 2021-11-11 Apollo Intelligent Connectivity (Beijing) Technology Co., Ltd. Method and apparatus for updating password of electronic device, device and storage medium
CN114020726A (en) * 2021-11-26 2022-02-08 中国电力科学研究院有限公司 Log auditing method, system, equipment and medium based on multivariate log data analysis
EP3979080A1 (en) * 2020-09-30 2022-04-06 Mastercard International Incorporated Methods and systems for predicting time of server failure using server logs and time-series data
WO2022087389A1 (en) * 2020-10-23 2022-04-28 Coinbase Crypto Services, LLC Blockchain orchestrator computer system
CN114610515A (en) * 2022-03-10 2022-06-10 电子科技大学 Multi-feature log anomaly detection method and system based on log full semantics
CN114676021A (en) * 2022-04-28 2022-06-28 中国工商银行股份有限公司 Job log monitoring method and device, computer equipment and storage medium
CN114741369A (en) * 2022-04-28 2022-07-12 浙江大学滨江研究院 System log detection method of graph network based on self-attention mechanism

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111209168A (en) * 2020-01-14 2020-05-29 中国人民解放军陆军炮兵防空兵学院郑州校区 Log sequence anomaly detection framework based on nLSTM-self attention
CN111930903A (en) * 2020-06-30 2020-11-13 山东师范大学 System anomaly detection method and system based on deep log sequence analysis
EP3979080A1 (en) * 2020-09-30 2022-04-06 Mastercard International Incorporated Methods and systems for predicting time of server failure using server logs and time-series data
WO2022087389A1 (en) * 2020-10-23 2022-04-28 Coinbase Crypto Services, LLC Blockchain orchestrator computer system
US20210349989A1 (en) * 2020-12-17 2021-11-11 Apollo Intelligent Connectivity (Beijing) Technology Co., Ltd. Method and apparatus for updating password of electronic device, device and storage medium
CN113434357A (en) * 2021-05-17 2021-09-24 中国科学院信息工程研究所 Log abnormity detection method and device based on sequence prediction
CN114020726A (en) * 2021-11-26 2022-02-08 中国电力科学研究院有限公司 Log auditing method, system, equipment and medium based on multivariate log data analysis
CN114610515A (en) * 2022-03-10 2022-06-10 电子科技大学 Multi-feature log anomaly detection method and system based on log full semantics
CN114676021A (en) * 2022-04-28 2022-06-28 中国工商银行股份有限公司 Job log monitoring method and device, computer equipment and storage medium
CN114741369A (en) * 2022-04-28 2022-07-12 浙江大学滨江研究院 System log detection method of graph network based on self-attention mechanism

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
NIVEDITA MISHRA,SHARNIL PANDYA: "Internet of Things Applications,Security Challenges,Attacks,Instrusion Detection,and Future visions:A Systematic Review" *
XINQIANG LI,WEINA NIU,XIAOSONG ZHANG,RUNZI ZHANG,ZHENQI YU,ZIMU LI: "Improving performance of Log Anomaly Detection with semantic and Time Features based on BiLSTM-Attention" *
夏彬;白宇轩;殷俊杰;: "基于生成对抗网络的***日志级异常检测算法" *
王青文: "面向公交车时序数据的异常检测算法研究" *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115794465A (en) * 2022-11-10 2023-03-14 上海鼎茂信息技术有限公司 Method and system for detecting log abnormity
CN115794465B (en) * 2022-11-10 2023-12-19 上海鼎茂信息技术有限公司 Log abnormality detection method and system
CN116074092A (en) * 2023-02-07 2023-05-05 电子科技大学 Attack scene reconstruction system based on heterogram attention network
CN116074092B (en) * 2023-02-07 2024-02-20 电子科技大学 Attack scene reconstruction system based on heterogram attention network
CN116405326A (en) * 2023-06-07 2023-07-07 厦门瞳景智能科技有限公司 Information security management method and system based on block chain
CN116405326B (en) * 2023-06-07 2023-10-20 厦门瞳景智能科技有限公司 Information security management method and system based on block chain

Also Published As

Publication number Publication date
CN115277180B (en) 2023-04-28

Similar Documents

Publication Publication Date Title
CN111475804B (en) Alarm prediction method and system
CN115277180B (en) Block chain log anomaly detection and tracing system
CN109697162B (en) Software defect automatic detection method based on open source code library
CN106357618B (en) Web anomaly detection method and device
CN110826320B (en) Sensitive data discovery method and system based on text recognition
CN110351301B (en) HTTP request double-layer progressive anomaly detection method
CN111881983B (en) Data processing method and device based on classification model, electronic equipment and medium
CN112491796B (en) Intrusion detection and semantic decision tree quantitative interpretation method based on convolutional neural network
CN108763931A (en) Leak detection method based on Bi-LSTM and text similarity
CN114124482B (en) Access flow anomaly detection method and equipment based on LOF and isolated forest
CN113590764B (en) Training sample construction method and device, electronic equipment and storage medium
CN112468347A (en) Security management method and device for cloud platform, electronic equipment and storage medium
CN106951565B (en) File classification method and the text classifier of acquisition
CN111598179A (en) Power monitoring system user abnormal behavior analysis method, storage medium and equipment
CN113407644A (en) Enterprise industry secondary industry multi-label classifier based on deep learning algorithm
CN114491082A (en) Plan matching method based on network security emergency response knowledge graph feature extraction
Harbola et al. Improved intrusion detection in DDoS applying feature selection using rank & score of attributes in KDD-99 data set
CN108920694A (en) A kind of short text multi-tag classification method and device
CN108647497A (en) A kind of API key automatic recognition systems of feature based extraction
CN116756659A (en) Intelligent operation and maintenance management method, device, equipment and storage medium
CN113259398B (en) Account security detection method based on mail log data
CN115618085A (en) Interface data exposure detection method based on dynamic label
CN114925759A (en) Feature analysis method for Ether fishing behavior account
CN114553468A (en) Three-level network intrusion detection method based on feature intersection and ensemble learning
CN111565192A (en) Credibility-based multi-model cooperative defense method for internal network security threats

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant