CN117436073A

CN117436073A - Security log alarming method, medium and equipment based on intelligent label

Info

Publication number: CN117436073A
Application number: CN202311767950.2A
Authority: CN
Inventors: 黄铧焕; 丁法景; 罗发强; 陈忠银; 黄志勇; 郑建英; 施日文
Original assignee: Fujian Jishu Network Technology Co ltd
Current assignee: Fujian Jishu Network Technology Co ltd
Priority date: 2023-12-21
Filing date: 2023-12-21
Publication date: 2024-01-23
Anticipated expiration: 2043-12-21
Also published as: CN117436073B

Abstract

The invention discloses a security log alarming method, medium and equipment based on intelligent labels, which firstly generate corresponding intelligent labels according to key characteristic information of security log data, then further classify the security log data based on the intelligent labels, each class contains a plurality of intelligent labels corresponding to the security log data, then arrange the intelligent labels in the same class according to time stamp information, divide the arranged intelligent labels into a plurality of groups according to time stamp sequence, then calculate the influence of the intelligent labels of the security log data in the groups on early warning event weight based on the groups, only when the group score is higher than a second preset threshold, the intelligent labels in the groups are reserved to participate in calculation of the score threshold, and then output based on the score threshold without outputting alarming information aiming at the current attack class. The scheme can eliminate the influence of normal log noise introduced during the early warning of the safety event, so that the warning information is more accurately prompted.

Description

Security log alarming method, medium and equipment based on intelligent label

Technical Field

The application relates to the technical field of computer network security, in particular to a security log alarming method, medium and device based on intelligent labels.

Background

The deep learning is a data-driven algorithm, an accurate physical model of the system is not required to be established, and the optimal characteristic representation of the system can be obtained only by collecting historical data of the system operation, so that tasks such as problem risk diagnosis and prediction, problem risk classification and tracing, problem risk alarm response and the like are completed. Through the technical scheme of exploring research and continuously improving and optimizing formed data intelligent labels in a plurality of fields such as a machine learning Mini-Batch SGD (random gradient descent) algorithm, natural language processing and the like, the efficiency and accuracy of acquisition, analysis, fusion and deep mining analysis of the multi-source multi-dimensional data of the safety log can be greatly improved, and the digitization capability and innovation application of network safety response treatment are enhanced.

The security threat is the focus of attention in the field of network security, aiming at various complicated and diversified network threats at present, a security management department can adopt various network threat detection system devices, the security threat detection system devices can generate a large number of security alarms, the security management personnel are required to carry out manual checking and confirmation on the large number of alarms, the real and effective threat alarms are found, on one hand, the reason for the situation is that the threat detection system devices have false alarms and even miss alarms, and therefore, the manual checking and confirmation are required, and the expertise and working experience of different auditors have great difference on the checking and confirmation; on the other hand, because the alarm information may have a threat such as suspicious scanning, a large number of alarms with low information value may be faced in the process of manually checking the alarms.

Therefore, when the current security management faces massive alarms, in order to quickly find out effective security threats and achieve effective security alarms, the security management personnel can prioritize the high-level alarm threats and then handle the low-level alarm threats, so that the effective security threats can be found out more quickly in a limited time, and the effective security alarms are achieved.

The Chinese patent with publication number of CN110958136A discloses a log analysis early warning method based on deep learning, which comprises the following steps: preprocessing the obtained logs of different types in the target system; carrying out log analysis on the preprocessed log by using a clustering-based method; encoding the parsed log event into a digital feature vector; learning the encoded log by using a LSTM-based neural network and a LogCollet-based clustering method to form early warning information; and tracing the early warning information to a component server corresponding to the load, and judging the fault point. The method and the device realize early warning and positioning of possible faults of the application system, provide corresponding solutions, further relieve system risks in advance and improve the safety condition of the system.

The Chinese patent with publication number of CN110347547A discloses a log abnormality detection method based on deep learning, which uses a history log file to carry out deep learning to obtain a log file detection model; receiving a log file to be detected in a preset time window; preprocessing a log file to be detected to obtain a log file test sample; performing cluster analysis on the log file test samples to obtain multiple types of log files and log keyword sequences corresponding to each type of log files; inputting the log keyword sequence into a log file detection model to perform abnormality detection; if the abnormality exists, a preset alarm prompt is sent to a preset application responsible person.

However, since the amount of security log data is large and a lot of noise data is mixed in the middle, the two schemes have problems that the processing time is long and the accuracy of outputting the early warning conclusion is not high when processing and analyzing the log.

Disclosure of Invention

In view of the above problems, the present application provides a technical solution for security log alarming based on an intelligent tag, so as to solve the technical problems of large calculation amount, inaccurate calculation result and the like in the existing security log early warning method.

To achieve the above object, in a first aspect, the present application provides a security log alert method based on a smart tag, the method including:

collecting safety log data, preprocessing the safety log data, and extracting key characteristic information in the safety log data, wherein the key characteristic information comprises timestamp information;

analyzing the key characteristic information by using the trained neural network model to generate an intelligent label, wherein the intelligent label is used for representing the security event type or attack type;

clustering analysis is carried out on the security log data according to the intelligent labels, and similar security event types or attack types are classified into one type according to the clustering analysis result, so that intelligent labels of a plurality of types are obtained;

calculating a score threshold value for each type of intelligent label according to the type of the intelligent label, and sending out an alarm prompt when the calculated score threshold value is larger than a corresponding first preset threshold value;

the score threshold is calculated according to the following manner: arranging a plurality of intelligent labels in the same category according to the time stamp sequence in the intelligent labels, sequentially calculating group scores corresponding to all intelligent label groups by taking a plurality of intelligent labels in the same category as a group, removing the intelligent label groups with the group scores lower than a second preset threshold value, calculating the score threshold value corresponding to the intelligent label in the current category according to the group scores of the rest intelligent label groups and corresponding group weight values, and determining the group weight values according to the time stamp information corresponding to the intelligent labels in the group.

Further, the neural network model is trained according to the following manner:

acquiring a training set, and randomly selecting a part of sample data from the training set as input data of each iteration;

transmitting the selected partial batch data to a label classifier, and calculating output data of the neural network model;

calculating a loss function according to the difference between the output data of the neural network model and the actual label;

according to the gradient of the loss function, training parameters in a label classifier are adjusted;

repeating the steps until the training set is completely traversed, and finishing training.

Further, the security log data includes text data, the preprocessing the security log data, and extracting key feature information in the security log data includes:

converting the text data into a numerical format which can be processed by a neural network model, and converting the text data into word embedding vectors after vocabulary division;

generating an input sequence according to the word embedding vector, and performing position coding on the input sequence;

independently calculating attention scores of different position coding parts of the input sequence by adopting an attention mechanism, and carrying out weighted average on the calculated attention scores to obtain a final output result of the attention mechanism;

And adding the output result of the attention mechanism and the output result of the feedforward neural network with the original input data, and carrying out layer normalization processing to obtain key characteristic information in the safety log data.

Further, the method comprises the steps of:

setting a corresponding alarm rule and a corresponding processing strategy for each type of intelligent label;

the sending out the alarm prompt comprises the following steps: and sending out an alarm prompt according to the alarm rule corresponding to the intelligent label of the current category, and adopting a corresponding processing strategy to process.

Further, the alarm prompt comprises any one or more of an alarm identifier, an alarm time, an alarm type, an alarm level, a security log data fragment related to the alarm, a suggested processing strategy and additional information.

Further, preprocessing the security log data includes:

performing data cleaning on the safety log data, and removing partial data which do not meet preset specifications in the safety log data; and converting unstructured data in the security log data into structured data;

and carrying out normalization processing on the numerical data in the safety log data, ensuring that each numerical data is in the same numerical range, and carrying out code conversion on the type data in the safety log data so as to convert the type data into a format which can be processed by the neural network model. Further, the arranging the plurality of smart tags in the same category according to the sequence of the time stamps in the smart tags includes:

And selecting a plurality of intelligent labels of the same category, of which the time stamp information is nearest to the time stamp at the current moment, and arranging the intelligent labels.

Further, the key feature information includes a static feature including any one or more of a device identification name, an IP address, and device port information, and a dynamic feature including the timestamp information, and further including a log level or event description.

In a second aspect, the present application provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the smart tag based security log alerting method of the first aspect of the present application.

In a third aspect, the present application provides an electronic device having stored thereon a computer program comprising a processor and a storage medium having stored thereon a computer program which, when executed by the processor, implements the smart tag based security log alerting method of the first aspect of the present application.

Unlike the prior art, the security log alarming method, medium and equipment based on the intelligent label in the technical scheme comprise the following steps: collecting safety log data and extracting key characteristic information in the safety log data; analyzing the key characteristic information by using the trained neural network model to generate an intelligent label; clustering analysis is carried out on the security log data according to the intelligent labels, and similar security event types or attack types are classified into one type according to the clustering analysis result, so that intelligent labels of a plurality of types are obtained; calculating a score threshold value for each type of intelligent label according to the type of the intelligent label, and sending out an alarm prompt when the calculated score threshold value is larger than a corresponding first preset threshold value; the score threshold is calculated according to the following manner: arranging a plurality of intelligent labels in the same category according to the time stamp sequence in the intelligent labels, sequentially calculating group scores corresponding to all intelligent label groups by taking a plurality of intelligent labels in the same category as a group, removing the intelligent label groups with the group scores lower than a second preset threshold value, calculating the score threshold value corresponding to the intelligent label in the current category according to the group scores of the rest intelligent label groups and corresponding group weight values, and determining the group weight values according to the time stamp information corresponding to the intelligent labels in the group.

According to the method, the corresponding intelligent labels are generated according to key characteristic information of the safety log data, the safety log data are further classified based on the intelligent labels, the intelligent labels corresponding to the safety log data in each category according to time stamp arrangement are divided into a plurality of groups, then the influence of the safety log data in the groups on the early warning event weight is calculated based on the groups, the intelligent labels in the groups are reserved to participate in calculation of the calculated score threshold only when the group score is higher than a second preset threshold, and further the alarm information is not required to be output according to the current attack category based on the score threshold output, so that the prompt of the alarm information is more accurate.

The foregoing summary is merely an overview of the present application, and is provided to enable one of ordinary skill in the art to make more clear the present application and to be practiced according to the teachings of the present application and to make more readily understood the above-described and other objects, features and advantages of the present application, as well as by reference to the following detailed description and accompanying drawings.

Drawings

The drawings are only for purposes of illustrating the principles, implementations, applications, features, and effects of the present invention and are not to be construed as limiting the application.

In the drawings of the specification:

FIG. 1 is a flow chart of a security log alert method based on smart tags according to a first exemplary embodiment of the present application;

FIG. 2 is a flow chart of a security log alert method based on smart tags according to a second exemplary embodiment of the present application;

FIG. 3 is a flow chart of a security log alert method based on smart tags according to a third exemplary embodiment of the present application;

FIG. 4 is a flow chart of a security log alert method based on smart tags according to a fourth exemplary embodiment of the present application;

FIG. 5 is a flowchart of a security log alert method based on smart tags according to a fifth exemplary embodiment of the present application;

FIG. 6 is a system architecture diagram implementing the security log alert method based on smart tags described herein;

fig. 7 is a schematic diagram of an electronic device according to a first exemplary embodiment of the present application;

reference numerals referred to in the above drawings are explained as follows:

10. an electronic device;

101. a processor;

102. a storage medium.

Detailed Description

In order to describe the possible application scenarios, technical principles, practical embodiments, and the like of the present application in detail, the following description is made with reference to the specific embodiments and the accompanying drawings. The embodiments described herein are only used to more clearly illustrate the technical solutions of the present application, and are therefore only used as examples and are not intended to limit the scope of protection of the present application.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application. The appearances of the phrase "in various places in the specification are not necessarily all referring to the same embodiment, nor are they particularly limited to independence or relevance from other embodiments. In principle, in the present application, as long as there is no technical contradiction or conflict, the technical features mentioned in the embodiments may be combined in any manner to form a corresponding implementable technical solution.

Unless defined otherwise, technical terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the present application pertains; the use of related terms herein is for the description of specific embodiments only and is not intended to limit the present application.

In the description of the present application, the term "and/or" is a representation for describing a logical relationship between objects, which means that there may be three relationships, e.g., a and/or B, representing: there are three cases, a, B, and both a and B. In addition, the character "/" herein generally indicates that the front-to-back associated object is an "or" logical relationship.

In this application, terms such as "first" and "second" are used merely to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any actual number, order, or sequence of such entities or operations.

Without further limitation, the use of the terms "comprising," "including," "having," or other like open-ended terms in this application are intended to cover a non-exclusive inclusion, such that a process, method, or article of manufacture that comprises a list of elements does not include additional elements in the process, method, or article of manufacture, but may include other elements not expressly listed or inherent to such process, method, or article of manufacture.

As in the understanding of the "examination guideline," the expressions "greater than", "less than", "exceeding", and the like are understood to exclude the present number in this application; the expressions "above", "below", "within" and the like are understood to include this number. Furthermore, in the description of the embodiments of the present application, the meaning of "a plurality of" is two or more (including two), and similarly, the expression "a plurality of" is also to be understood as such, for example, "a plurality of groups", "a plurality of" and the like, unless specifically defined otherwise.

In the description of the embodiments of the present application, spatially relative terms such as "center," "longitudinal," "transverse," "length," "width," "thickness," "up," "down," "front," "back," "left," "right," "vertical," "horizontal," "vertical," "top," "bottom," "inner," "outer," "clockwise," "counter-clockwise," "axial," "radial," "circumferential," etc., are used herein as terms of orientation or positional relationship based on the specific embodiments or figures, and are merely for convenience of description of the specific embodiments of the present application or ease of understanding of the reader, and do not indicate or imply that the devices or components referred to must have a particular position, a particular orientation, or be configured or operated in a particular orientation, and therefore are not to be construed as limiting of the embodiments of the present application.

Unless specifically stated or limited otherwise, in the description of the embodiments of the present application, the terms "mounted," "connected," "affixed," "disposed," and the like are to be construed broadly. For example, the "connection" may be a fixed connection, a detachable connection, or an integral arrangement; the device can be mechanically connected, electrically connected and communicated; it can be directly connected or indirectly connected through an intermediate medium; which may be a communication between two elements or an interaction between two elements. The specific meanings of the above terms in the embodiments of the present application can be understood by those skilled in the art to which the present application pertains according to the specific circumstances.

As shown in fig. 1, in a first aspect, the present application discloses a security log alert method based on a smart tag, the method comprising:

firstly, entering a step S101 to collect security log data, preprocessing the security log data, and extracting key characteristic information in the security log data, wherein the key characteristic information comprises timestamp information;

step S102 is carried out, the key characteristic information is analyzed by utilizing the trained neural network model, and an intelligent label is generated and used for representing the security event type or attack type;

step S103 is carried out to perform cluster analysis on the security log data according to the intelligent labels, and similar security event types or attack types are classified into one type according to the cluster analysis result, so that intelligent labels with multiple types are obtained;

and then, step S104 is carried out, a score threshold is calculated for each type of intelligent label according to the type of the intelligent label, and when the calculated score threshold is larger than a corresponding first preset threshold, an alarm prompt is sent out.

In step S101, the security log data may be generated by a network device, a system, a service program, etc. during operation, and the security log data may include time stamp information (i.e., log generation time), a hostname (a hostname that generates current log data), an event description, etc.

The preprocessing is a step for performing standardized processing on the security log data, and the preprocessing includes any one or more of operations of removing duplicate data, filling in missing values, converting data into a uniform format, and the like.

Preferably, the key feature information includes a static feature and a dynamic feature, the static feature includes any one or more of a device identification name, an IP address, and device port information, and the dynamic feature includes the timestamp information, and further includes a log level or an event description.

In step S102, the neural network model may include any one of a feed forward neural network (Feedforward Neural Network), a convolutional neural network (Convolutional Neural Network, CNN), a cyclic neural network (Recurrent Neural Network, RNN), a support vector machine (Support Vector Machine, SVM), a generation countermeasure network (Generative Adversarial Network, GAN), and a self encoder (Autoencoder).

In step S103, cluster analysis is an unsupervised learning method for classifying a set of data samples into meaningful categories or clusters. The cluster analysis may be implemented by a cluster analysis algorithm, which may include any of K-means Clustering (K-means Clustering), hierarchical Clustering (Hierarchical Clustering), DBSCAN (Density-Based Spatial Clustering of Applications with Noise), density Clustering (Density-based Clustering), spectral Clustering (Spectral Clustering), local outlier factors (Local Outlier Factor, LOF), gaussian mixture models (Gaussian Mixture Model, GMM), BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies).

In step S104, the magnitude of the first preset threshold may be set according to actual needs. The alarm prompt can be an audible prompt, a luminous prompt, a vibration prompt, or a text, a graphic prompt on a display interface, etc.

As shown in fig. 2, the score threshold is calculated according to the following manner:

firstly, step S201 is entered to arrange a plurality of intelligent labels in the same category according to the time stamp sequence in the intelligent labels;

step S202 is then carried out, wherein a plurality of intelligent labels in the same category are used as a group to sequentially calculate group scores corresponding to the intelligent label groups, and the intelligent label groups with the group scores lower than a second preset threshold value are removed;

and then, step S203 is carried out to calculate the score threshold corresponding to the intelligent label of the current category according to the group score of the rest intelligent label group and the corresponding group weight value, wherein the group weight value is determined according to the timestamp information corresponding to the intelligent label in the group.

In step S201, for example, after cluster analysis, it is known that there are 100 smart tags under the type a attack, and the time stamp information (i.e., the generation time of the security log data) recorded in the 100 smart tags may be sorted according to the distance from the current time stamp. Specifically, the arranging the plurality of smart tags in the same category according to the sequence of the time stamps in the smart tags includes: and selecting a plurality of intelligent labels of the same category, of which the time stamp information is nearest to the time stamp at the current moment, and arranging the intelligent labels.

In step S202, the grouping manner is preferably that the number of the smart tags included in each group is the same, so that the subsequent calculation of the group score is convenient for performing a lateral comparison, and for the calculation of the group score corresponding to each smart tag group, the calculation may be that the score of each smart tag in the group is averaged, the score of each smart tag may be input into a preset neural network model for calculation, the input of the neural network model is the characteristic data in the smart tag, and the input is the score of each smart tag, and the score is used for characterizing the probability degree that the security log data corresponding to a certain smart tag can be used as an alarm event. The magnitude of the second preset threshold value can be set according to actual needs.

In step S203, the determination of the group weight value according to the timestamp information corresponding to the smart tag in the group means that the weight value of each packet increases in sequence according to the distance from the current timestamp information, that is, the closer to the current timestamp (this point can be obtained by obtaining the timestamp corresponding to the smart tag in each packet), the greater the weight value of the packet.

According to the scheme, the intelligent tag groups with the group scores lower than the second preset threshold value are removed by taking the group as a unit, when a certain security log data is too far away from the current time stamp or the probability of being judged to be required to be alarmed is too low, early warning is not carried out, the intelligent tag groups meeting certain condition requirements are ensured to participate in the calculation of the score threshold value, and therefore prompting of alarm information is more accurate.

As shown in fig. 3, in some embodiments, the neural network model is trained according to the following:

firstly, step S301 is carried out to acquire a training set, and a part of sample data is randomly selected from the training set to serve as input data of each iteration;

step S302 is carried out, selected partial batch data are transmitted to a label classifier, and output data of a neural network model are calculated;

step S303 is carried out to calculate a loss function according to the difference between the output data of the neural network model and the actual label;

step S304 is carried out, and training parameters in the label classifier are adjusted according to the gradient of the loss function;

and then, the step S305 is carried out, and the steps are repeated until the training set is completely traversed, so that training is completed.

Through the scheme, training of the neural network model can be achieved, and generation of the intelligent label is more accurate.

In some embodiments, as shown in fig. 4, the security log data includes text data, the preprocessing the security log data, and extracting key feature information in the security log data includes:

firstly, entering step S401 to convert the text data into a numerical format which can be processed by a neural network model, and converting the text data into word embedding vectors after vocabulary division;

Step S402 is then entered to generate an input sequence from the word embedding vector and to position encode the input sequence;

then, step S403 is carried out to independently calculate attention scores of different position coding parts of the input sequence by adopting an attention mechanism, and weighted average is carried out on the calculated attention scores to obtain a final output result of the attention mechanism;

and then, step S404 is carried out to add the output result of the attention mechanism, the output result of the feedforward neural network and the original input data, and layer normalization processing is carried out to obtain key characteristic information in the safety log data.

In some embodiments, the method comprises: setting a corresponding alarm rule and a corresponding processing strategy for each type of intelligent label; the sending out the alarm prompt comprises the following steps: and sending out an alarm prompt according to the alarm rule corresponding to the intelligent label of the current category, and adopting a corresponding processing strategy to process.

In some embodiments, the alert cues include any one or more of alert identification, alert time, alert type, alert level, security log data segments related to the alert, suggested processing policies, additional information.

In some embodiments, preprocessing the security log data includes:

performing data cleaning on the safety log data, and removing partial data which do not meet preset specifications in the safety log data; and converting unstructured data in the security log data into structured data; and carrying out normalization processing on the numerical data in the safety log data, ensuring that each numerical data is in the same numerical range, and carrying out code conversion on the type data in the safety log data so as to convert the type data into a format which can be processed by the neural network model.

By the scheme, key characteristic information in the safety log data can be effectively extracted, and support is provided for subsequent operation steps. As shown in fig. 5, the following further describes the method related to the present application in conjunction with a specific application scenario: in this embodiment, the security log alarming method based on the intelligent label mainly comprises the following steps:

(11) And (5) data acquisition. Specifically, security log data generated by network equipment, a system, a service program and the like during operation is collected.

(21) And (5) preprocessing data. Specifically, the method comprises the steps of collecting and arranging security log data, cleaning and formatting the data, and extracting key characteristic information such as a time stamp, a host name, a log level, an event description and the like.

(31) Feature extraction (i.e., key feature information extraction). Specifically, features are extracted from the preprocessed data, including static features (e.g., hostname, IP address, port number, etc.) and dynamic features (e.g., log level, event description, timestamp, etc.).

(41) And generating a smart label. Specifically, the extracted features are learned and analyzed by using a machine learning classification algorithm to generate intelligent labels, and the intelligent labels can represent similar security events or attack types.

(51) And (5) cluster analysis. Today, security log data is clustered according to features in smart tags, classifying similar security events or attack types into one class.

(61) And (5) formulating alarm rules. Specifically, according to the result of the cluster analysis, an alarm rule is formulated. For example, if a certain smart tag represents a type of malicious attack and the frequency or severity of the tag exceeds a preset threshold, an alarm is triggered.

(71) And (5) alarming and outputting. Specifically, triggered alarm information is output, and the alarm information can include information such as alarm time, alarm level, alarm description and the like. And by combining a visualization technology, the alarm information is presented in the form of a chart and the like, so that a user can conveniently and quickly know the safety condition.

(81) Feedback optimization: according to actual running conditions and user feedback, the intelligent label generation and cluster analysis algorithm is continuously optimized, and the alarm accuracy and the practicability are improved.

Through the scheme, the safety log data can be effectively processed and analyzed, potential safety threats can be timely found and early warned, and powerful support is provided for network safety protection.

As shown in fig. 6, to implement the security log alarming method based on the smart tag described in the present application, the system architecture includes the following functional module components:

(12) And a data acquisition module: the module is responsible for collecting security log data from various security devices. The data may include a variety of different information such as a timestamp, hostname, log level, event description, etc.

(22) And a data preprocessing module: the module is responsible for cleaning, formatting and standardizing the collected security log data. This may include operations to remove duplicate data, fill in missing values, convert data to a uniform format, and so on.

(32) And the feature extraction module is used for: the module is responsible for extracting useful features from the preprocessed data. These features may include static features (e.g., hostname, IP address, port number, etc.) and dynamic features (e.g., log level, event description, timestamp, etc.).

(42) The intelligent label generating module: the module learns and analyzes the extracted features by using a machine learning classification algorithm to generate the intelligent label. Each smart tag may represent a particular security event or attack type.

(52) And a cluster analysis module: the module performs cluster analysis on the data generating the intelligent label, and classifies similar security events or attack types into one type.

(62) Alarm rule making module: the module formulates an alarm rule according to the result of the cluster analysis. For example, if a certain smart tag represents a type of malicious attack and the frequency or severity of the tag exceeds a preset threshold, an alarm is triggered.

(72) Alarm output module: the module outputs the triggered alarm information, which may include information such as alarm time, alarm level, alarm description, etc. And by combining a visualization technology, the alarm information is presented in the form of a chart and the like, so that a user can conveniently and quickly know the safety condition.

(82) And a feedback optimization module: according to the actual running condition and user feedback, the intelligent label generation and cluster analysis algorithm is continuously optimized, and the alarm accuracy and the practicability are improved.

(92) And a system management module: the module is responsible for configuration, monitoring and management of the whole system. It may allow a user to set various parameters, monitor the operating state of the system, and adjust the system configuration as desired.

The security log clustering analysis alarm system based on the intelligent label can effectively process and analyze a large amount of security log data, timely discover and early warn potential security threats, and provide powerful support for network security protection. Meanwhile, the system can be customized and expanded according to actual conditions so as to meet the requirements of different users.

The intelligent label generation aims at automatically marking relevant labels for each piece of security log data, and the labels can accurately reflect the information such as security event types, threat levels and the like contained in the log and provide key basis for subsequent cluster analysis and alarm. In this embodiment, the specific steps of smart label generation are as follows:

(13) Tag coding: a tag is provided for each piece of log data. These tags are encoded using one-hot encoding to facilitate model training.

(23) Training based on Mini-Batch SGD: the label classifier was trained using the Mini-Batch SGD algorithm. The method comprises the following specific steps:

(33) A small batch of data is randomly selected from the training set as input for each iteration.

(43) The selected small batch of data is passed to a tag classifier and the output of the network is calculated.

(53) The loss function is calculated from the network output and the actual label. Common loss functions include cross entropy loss, mean square error, etc.

(63) And updating parameters of the classifier by using a Mini-Batch SGD algorithm. The parameters in the classifier are adjusted by a small amount according to the gradient of the loss function.

(73) Repeating the above steps until the training set is completely traversed once, and completing one round of training (epoch).

(83) Generating intelligent labels: during training, the tag probability for each data instance is predicted by a tag classifier. Depending on the actual requirements, a probability threshold may be selected to determine the final tag. For each data instance, the tag with the highest probability is used as the result of intelligent tag generation.

In step (23), mini-Batch SGD (random gradient descent) algorithm is a common optimization algorithm in deep learning. In contrast to a standard SGD, it uses a small batch of data each time model parameters are updated, rather than one data point or the entire data set. The variance of gradient update can be reduced, and the training of the model is accelerated by utilizing the parallelism of matrix operation.

In step (53), in order to generate the smart tag, a suitable loss function is defined, which is able to measure the gap between the model-generated tag and the real tag. Depending on the specific task, different loss functions may be selected, such as cross entropy loss, mean square error loss, etc.

Correspondingly, the generation of the intelligent label can be realized through an intelligent label generation module, and the intelligent label module can specifically comprise the following modules:

(14) And a tag coding module: the method comprises the steps of label preprocessing, selecting a proper coding mode, constructing a label coding table, realizing a coding conversion function, introducing a coding cache mechanism and the like. Through the design and implementation of the module, the original label can be converted into a numerical form which can be processed by a model, and effective input characteristics are provided for subsequent intelligent label generation.

(24) Training and optimizing module: this module trains and optimizes the model based on Mini-Batch SGD algorithm. According to the algorithm principle, gradient calculation and parameter updating logic of the Mini-Batch SGD are realized. And calculating a loss value between the label generated by the model and the real label according to the selected loss function. The tissue training process comprises the operations of iteration number control, batch data extraction, gradient updating and the like. After each iteration or cycle is completed, the performance of the model on the validation set is evaluated and the optimal model parameters are saved.

(34) The label generating module: this module is responsible for smart tag generation using the trained model. The original data of the label to be generated is received. And carrying out preprocessing and feature extraction operations which are the same as those of training data on the data to be generated. And inputting the processed data into the trained model to generate corresponding intelligent labels. The generated tag is subjected to necessary post-processing operations such as tag mapping, format conversion, and the like.

The content provides a systematic and comprehensive method, and can fully utilize the representation capability of the deep learning model and the advantages of an optimization algorithm to realize the task of intelligent label generation.

In this embodiment, the goal of feature extraction is to extract meaningful, quantifiable features from the preprocessed security log data for smart tag generation and cluster analysis. These features can characterize the nature, behavior, and pattern of security events, providing critical information for subsequent analysis and alerting. And performing feature extraction on the preprocessed log data by using a transducer model, wherein the transducer can effectively capture long-term dependence in the sequence data.

In this embodiment, the feature extraction may specifically include the following steps:

(15) Input embedding: first, the input data is converted into a numerical format that the model can handle, and the words are converted into word embedding vectors.

(25) Position coding: since the transducer model does not contain any information about the order of elements, it is necessary to add position coding to provide such information, which is typically added to the input embedding.

(35) Self-attention mechanism: this is the core part of the transducer model. The self-attention mechanism enables the model to focus on different parts of the input sequence, providing the possibility to understand the global context information.

(45) Feature extraction: through the multi-layer structure of the transducer, each layer performs self-attention calculation and feedforward neural network calculation, so that characteristics of different abstraction levels of input data can be extracted.

(55) And (3) output processing: finally, depending on the task, it may be necessary to add some additional processing steps after feature extraction. For example, in a classification task, it may be necessary to add a fully connected layer and a softmax function over the features to generate the final predictions.

In step (25), position coding is used to inject position information for elements in the sequence, since the transducer model itself does not contain sequence order information. The position coding may be fixed or learned and is typically added to the input embedding.

In step (35), the self-attention mechanism is the core component of the transducer model. It calculates a weight for each element in the sequence to decide which elements should be of interest in the encoding process. In this way, the model can better capture relationships and dependencies between elements.

The multi-headed attention mechanism allows the model to focus on different aspects of the input sequence at the same time. The inputs are mapped to a number of different heads (heads) by linear transformation, each head independently calculating self-attention, and the results are stitched or averaged as the final output.

In step (55), in the transducer, residual connection and layer normalization are used to deepen the network and enhance the stability of the training. The residual connection adds the input directly to the output, helping to alleviate the gradient vanishing problem. And the layer normalization normalizes the output of each layer, accelerates convergence and improves the generalization capability of the model.

Accordingly, the feature extraction function may be implemented by a feature extraction module, which typically includes the following modules:

(16) An input module: is responsible for receiving and preprocessing input data. For text data, this may include the steps of word segmentation, conversion to embedded vectors, and so on.

(26) And a position coding module: the module is responsible for generating a position code for the input sequence. May be implemented using a predefined sine/cosine function or other method. The position coding should have the same dimensions as the input embedding and may be additive.

(36) Multi-head self-attention module: this module implements a multi-headed self-attention mechanism. It includes the computation of multiple self-attention heads, each independently computing an attention score and generating a weighted output. These outputs are then combined to form a multi-headed self-attention final output.

(46) Residual connection and layer normalization module: this module is responsible for applying the residual connection and layer normalization operations. It receives the outputs of the self-attention module and the feedforward neural network module, adds them to the original input, and performs layer normalization. This design helps stabilize the training process and accelerates convergence.

(56) And an output module: the output module extracts and processes the final output features of the model. These feature vectors may be used for subsequent clustering, classification, etc. tasks.

The above is a design of security log data feature extraction based on a transducer. Through the processing of the input embedding layer, the transducer encoder and the feature output step, key features of the security log data can be effectively extracted, and powerful support is provided for subsequent security analysis and alarming.

In this embodiment, the goal of the alarm rule formulation is to define a set of flexible and efficient rules based on the result of the cluster analysis and the smart tag, for determining whether to trigger a security alarm. These rules should be able to accurately identify potential security threats and reduce false positives and false negatives. In this embodiment, the alarm rule formulation flow is as follows:

(17) And (5) analyzing clustering results: the results of the cluster analysis are studied deeply, and the characteristics and the contained labels of each cluster are known.

(27) Determining an alarm index: and selecting proper alarm indexes such as the quantity of clusters, the growth rate, the combination of labels and the like according to the security requirements and the business influence.

(37) Setting a threshold value: an appropriate threshold is set for each alarm indicator. These thresholds may be determined based on historical data, business needs, and risk assessment.

(47) Defining an alarm action: the actions that should be performed when triggering an alarm, such as sending a notification, activating a defense mechanism, logging, etc., are designed.

(57) And (3) verifying and optimizing alarm rules: and verifying the accuracy of the alarm rules in a real environment or a simulation environment. And according to the verification result, adjusting a threshold value and an alarm action, and optimizing the efficiency of the rule.

(67) Rule storage and updating: securely store alert rules and design a set of mechanisms that allow rules to be dynamically updated when necessary.

The alert rules should also take into account the impact of several factors in the design:

a. flexibility and configurability: the alarm rules should be flexible enough to accommodate changing security environments. At the same time, a set of user interfaces or APIs should be provided that allow security specialists to configure or adjust rules as needed.

b. Misinformation and missing report balance: when the alarm rule is formulated, risks of false alarm and false alarm are weighed, so that the alarm system can respond in time when a real threat is found, and resource waste and user fatigue caused by frequent false alarm are avoided.

Correspondingly, the instruction of the alarm rule can be realized through an alarm rule module, and the alarm rule module is designed as follows:

a. rule definition and storage submodule: allowing the user or system to define new alert rules and store them securely.

b. Alarm judging and triggering sub-module: and monitoring the clustering result in real time, and judging whether to trigger an alarm according to the alarm rule and the set threshold value.

c. An alarm action execution sub-module: upon triggering an alarm, predefined alarm actions are performed, such as sending a notification, logging, etc.

d. Rule verification and optimization sub-module: tools or interfaces are provided to help the user verify the accuracy of the alert rules and to provide optimization suggestions.

Through the design, a set of security log alarming rules based on intelligent labels and clustering analysis results can be formulated. These rules will enhance the accuracy and effectiveness of the alert system, help organizations better address potential security threats, and ensure business continuity and data security.

In this embodiment, the goal of the alert output is to clearly and accurately convey the alert information to the relevant person or system. The alert output should ensure that critical information is not missed and trigger a corresponding response action if necessary. Therefore, in the present embodiment, the alarm output content is specifically as follows:

a. Alarm identification: each alarm should have a unique identification to facilitate tracking and management.

b. Alarm time: the specific time of alarm generation is recorded, so that time sequence analysis and history review are facilitated.

c. Alarm type: and identifying the type of the alarm, such as intrusion attempt, abnormal behavior and the like, according to the intelligent label and the cluster analysis result.

d. Alarm level: the alert is set to a level, such as low, medium, high, depending on the severity and impact of the threat.

e. Related log data: the key log data segments associated with alarms are used to support analysis and diagnostics.

f. Suggested actions: providing suggested actions for the alert, such as blocking a certain IP address, updating a certain software version, etc.

g. Additional information: other information that is helpful in understanding and responding to alarms, such as related charts, links, etc.

In this embodiment, the output mode of the alarm prompt may be implemented in the following several modes:

a. system output: the alarm is directly displayed on the analysis system, and the method is suitable for real-time monitoring and instant response.

b. Email notification: the alert information is sent to the relevant personnel via email ensuring that they are notified even if they are not in front of the console.

c. API push: and the alarm information is pushed to other systems or services through the API, so that integration and automatic response are realized.

d. Log file record: and writing the alarm information into a log file for subsequent analysis and audit.

Preferably, the present application further provides an optimization mechanism for alert output, specifically as follows:

a. output format customization: the format of the alert output, such as JSON, XML or other format, is allowed to be customized by the user as desired.

b. Output frequency control: prevent the alarm from overflowing, set up the appropriate output frequency, such as merge the similar alarms, limit the maximum alarm number of per hour, etc.

c. Alarm upgrading mechanism: if an alarm occurs frequently or continues to be unprocessed for a period of time, there should be a set of mechanisms to raise its level to draw more attention.

d. Alarm feedback loop: and providing feedback of channel collection users on the alarms, and continuously optimizing the accuracy and the effectiveness of the alarms.

Correspondingly, the alarm output optimization mechanism can be realized through an alarm output module, and the alarm output module mainly comprises the following modules:

a. alarm formatting submodule: is responsible for formatting the alert information into the desired output format.

b. Alarm transmission sub-module: and sending the formatted alarm information to the corresponding target according to different output modes.

c. Output record and log submodule: the output conditions of all alarms, including success, failure, time, etc., are recorded to ensure traceability.

d. Output configuration and management sub-module: the manner, goal, frequency, etc. in which the administrator is allowed to configure and manage the alert outputs.

A well-designed alert output mechanism can ensure that critical security information is captured and acted upon by the correct personnel at the correct time. By the design, a clear, accurate and efficient alarm output method is hoped to be provided, so that the overall security defense and response capability is improved.

In this embodiment, the purpose of the data preprocessing is to perform cleaning, conversion and normalization operations on the original security log data, so as to facilitate subsequent smart tag generation and cluster analysis. The preprocessed data should have consistency, accuracy and usability. In this embodiment, the data preprocessing flow specifically includes the following steps:

(18) Data cleaning: extraneous information, duplicate records, and incomplete data in the log are removed.

(28) Data conversion: the unstructured data in the log are converted into structured data, and in addition, the time stamps in the log are required to be uniformly processed, so that subsequent time sequence analysis is facilitated.

(38) Data normalization: for numeric data, normalization processing, such as min-max normalization or Z-score normalization, is performed to ensure that each feature is within the same numeric range. For category type data, transcoding is performed.

In particular, the normalization of data can also be achieved in several ways:

a. template matching: and defining an information extraction parameter by using a log template, matching the information before and after the parameters of the log by using the template, and extracting the information of the parameter position.

b. Regular expression matching: in the data cleansing phase, specific patterns are matched and replaced by regular expressions to remove irrelevant information and unified data formats.

c. Timestamp processing: and analyzing and converting the timestamp fields in the log, and unifying the timestamp fields into a standard time format, so that subsequent analysis and visualization are facilitated.

Correspondingly, the preprocessing function can be realized through a preprocessing module, and the preprocessing module is designed as follows:

a. an input interface: and receiving original safety log data, and supporting various data formats and input modes.

b. Pretreatment rule management: the user is allowed to customize preprocessing rules, such as regular expression patterns, template matching methods, etc., to accommodate security logs of different sources and formats.

c. Preprocessing engine: data cleansing, transformation and normalization operations are performed according to predefined rules. The engine should have efficient, stable and scalable characteristics.

d. Output interface: and the preprocessed data is safely transmitted to a subsequent feature extraction module, so that the integrity and usability of the data are ensured.

Through the design, an effective and efficient data preprocessing flow can be provided for the security log clustering analysis alarming method based on the intelligent label. This not only improves the accuracy of the subsequent analysis, but also lays a solid foundation for the stability and reliability of the overall system.

In this embodiment, the goal of the data collection is to accurately collect security log data from the various security devices and systems in real time. These data are the basis for subsequent smart tag generation, cluster analysis, and alarms. The data acquisition may mainly include the following:

a. and (3) real-time data acquisition: real-time connections are established with various security devices and systems, and log data is acquired in real-time based on standard protocol acquisition, including HTTP/HTTPS, SNMP, SSH, TELNET, JDBC, ODBC, S3, NTP, SNMPTrap, syslog, etc.

b. Historical data acquisition: the historical data which cannot be acquired in real time can be acquired by means of batch importing, log file analyzing and the like.

The data acquisition process is specifically as follows:

a. determining a data source: firstly, determining the range of security equipment and a system which need to collect logs, and knowing the data output mode and format of the security equipment and the system.

b. Establishing connection and configuration: according to the type and the characteristics of the data source, corresponding connection is established, and necessary parameters such as data filtering rules, data transmission frequency and the like are configured.

c. Real-time/historical data acquisition: and acquiring data or importing historical data in real time according to a data source supporting mode. In this process, the integrity, continuity and accuracy of the data are ensured.

d. Data primary processing: and in the data acquisition stage, data is subjected to preliminary processing, data decryption, compression and preliminary formatting, so that the network transmission pressure is reduced and the subsequent processing efficiency is improved.

e. Data transmission and storage: and the collected data is safely transmitted to a server for efficient and reliable storage, so that the data is ensured not to be lost, and a stable data source is provided for subsequent processing.

In the process of data acquisition, the following factors are mainly considered:

a. diversity of data sources: considering the diversity of security devices and systems, it is desirable to design and implement a variety of data acquisition modes to accommodate different data sources.

b. Real-time nature of data: in order to discover and respond to security threats in time, it is necessary to ensure real-time acquisition and transmission of data.

c. Integrity and accuracy of data: in the data acquisition process, a series of measures such as data transmission verification, log integrity check and the like are required to be taken, so that the integrity and the accuracy of data are ensured.

d. Security of data transmission: in the data transmission process, encryption, compression and other technologies are adopted to ensure that the data is not leaked and tampered.

In this embodiment, data collection may be implemented by a data collection module, which specifically includes:

a. a data source management sub-module: information responsible for managing all data sources, including type, address, authentication information, etc.

b. And a data acquisition sub-module: and executing real-time or historical data acquisition tasks according to the type and configuration of the data sources.

c. And a data transmission and storage sub-module: and the collected data is safely transmitted to the server and is stored efficiently and reliably.

d. And a data monitoring sub-module: and monitoring the data acquisition process, and timely taking corresponding treatment measures for the occurrence of the abnormality or the fault.

Through the design, a high-efficiency, stable and safe data acquisition system can be established, real-time and accurate data support is provided for subsequent intelligent label generation, cluster analysis and alarming, and a solid foundation is laid.

The application has the following advantages:

(1) The automatic labeling of the data is realized through a deep learning algorithm model, so that the automatic classification and grading of the data are achieved; and through the deep learning feature extraction algorithm model, the alarm noise reduction is realized, and the accurate alarm is achieved.

The alarming fatigue is a great challenge faced by the current safety management, large-scale alarming and 7×24-hour safety management response requirements are met by stacking people, the low-quality conversion of human resources and the input-output ratio are low, the working efficiency and the working cost are both not sustainable, and the method cannot be widely popularized.

Design of security log data feature extraction based on a transducer. Through the processing of the input embedding layer, the transducer encoder and the feature output step, key features of the security log data can be effectively extracted, and an identification tag feature library of the alarm hierarchical classification is formed.

The design of intelligent label data automatic labeling based on Mini-Batch SGD algorithm (random gradient descent) can automatically label related classification for each piece of security log data, and the labels can accurately reflect the information such as security event type, threat level and the like contained in the log, thereby achieving effective noise reduction of alarm and accurate alarm.

(2) Through a deep learning DBSCAN clustering algorithm, accurate clustering analysis of the safety log data is realized, and the internal association and mode of deep mining safety events are achieved. The objective of the cluster analysis is to further classify and classify security log data which has been labeled with an intelligent label, so that log data in the same group are as similar as possible, the similarity of the same data is up to 99.99%, and data in different groups are as different as possible. Through cluster analysis, the intrinsic mode and association of the security event can be further discovered, and a more accurate basis is provided for alarming.

(3) And setting an efficient alarm rule, wherein the alarm rule is based on a clustering analysis result and an intelligent label, and a set of flexible and efficient rule is defined for judging whether to trigger a safety alarm. The rules can automatically and accurately identify potential security threats, realize alarm output and automatically connect security equipment and security personnel to carry out response treatment of the security threats.

In a second aspect, the present invention also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the security log alerting method based on smart tags of the first aspect of the present invention.

Wherein the computer readable storage medium may be volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory.

The non-volatile Memory may be a Read Only Memory (ROM), a programmable Read Only Memory (PROM, programmable Read Only Memory), an erasable programmable Read Only Memory (EPROM, erasable Programmable Read Only Memory), an electrically erasable programmable Read Only Memory (EEPROM, electrically Erasable Programmable Read Only Memory), a magnetic random access Memory (FRAM, ferromagnetic random access Memory), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical disk, or a compact disk Read Only (CD ROM, compact Disc Read Only Memory); the magnetic surface memory may be a disk memory or a tape memory.

The volatile memory may be random access memory (RAM, random Access Memory), which acts as external cache memory. By way of example, and not limitation, many forms of RAM are available, such as static random access memory (SRAM, static Random Access Memory), synchronous static random access memory (SSRAM, synchronous Static Random Access Memory), dynamic random access memory (DRAM, dynamic Random Access Memory), synchronous dynamic random access memory (SDRAM, synchronous Dynamic Random Access Memory), double data rate synchronous dynamic random access memory (ddr SDRAM, double Data Rate Synchronous Dynamic Random Access Memory), enhanced synchronous dynamic random access memory (ESDRAM, enhanced Synchronous Dynamic Random Access Memory), synchronous link dynamic random access memory (SLDRAM, syncLink Dynamic Random Access Memory), direct memory bus random access memory (DRRAM, direct Rambus Random Access Memory). The computer-readable storage media described in connection with the embodiments of the present invention are intended to comprise these and any other suitable types of memory.

As shown in fig. 7, in a third aspect, the present invention provides an electronic device 10, including a processor 101 and a storage medium 102, where a computer program is stored, the computer program, when executed by the processor, implementing a security log alert method based on a smart tag according to the first aspect of the present invention.

In some embodiments, the processor may be implemented in software, hardware, firmware, or a combination thereof, and may use at least one of a circuit, a single or multiple application specific integrated circuits (Application Specific Integrated Circuit, ASIC), a digital signal processor (Digital Signal Processor, DSP), a digital signal processing device (Digital Signal Processing Device, DSPD), a programmable logic device (Programmable Logic Device, PLD), a field programmable gate array (Field Programmable Gate Array, FPGA), a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or a combination thereof, such that the processor may perform some or all of the steps in the smart tag based security log alert method described in various embodiments of the present application, or any combination thereof.

Finally, it should be noted that, although the foregoing embodiments have been described in the text and the accompanying drawings of the present application, the scope of the patent protection of the present application is not limited thereby. All technical schemes generated by replacing or modifying equivalent structures or equivalent flows based on the essential idea of the application and by utilizing the contents recorded in the text and the drawings of the application, and the technical schemes of the embodiments are directly or indirectly implemented in other related technical fields, and the like, are included in the patent protection scope of the application.

Claims

1. A security log alert method based on smart tags, the method comprising:

2. The smart tag-based security log alert method of claim 1, wherein the neural network model is trained according to the following:

3. The security log alert method based on intelligent tag as claimed in claim 1, wherein the security log data comprises text data, the preprocessing the security log data, and extracting key feature information in the security log data comprises:

4. The security log alert method based on intelligent tags as claimed in claim 1, wherein the method comprises:

5. The smart tag-based security log alert method of claim 1 or 4, wherein the alert cues include any one or more of an alert identification, an alert time, an alert type, an alert level, a security log data fragment associated with an alert, a suggested processing policy, additional information.

6. The security log alert method based on intelligent tags of claim 1, wherein preprocessing the security log data comprises:

performing data cleaning on the safety log data, and removing partial data which do not meet preset specifications in the safety log data; and

converting unstructured data in the security log data into structured data;

and carrying out normalization processing on the numerical data in the safety log data, ensuring that each numerical data is in the same numerical range, and carrying out code conversion on the type data in the safety log data so as to convert the type data into a format which can be processed by the neural network model.

7. The security log alert method based on intelligent tags of claim 1, wherein preprocessing the security log data comprises:

converting unstructured data in the security log data into structured data;

8. The smart tag-based security log alert method of claim 1, wherein the key feature information comprises a static feature comprising any one or more of a device identification name, an IP address, device port information, and a dynamic feature comprising the timestamp information, further comprising a log level or event description.

9. A computer readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements the smart label based security log alerting method of any one of claims 1 to 8.

10. An electronic device having stored thereon a computer program comprising a processor and a storage medium having stored thereon a computer program which, when executed by the processor, implements the smart tag based security log alert method of any of claims 1 to 8.