CN107579944A

CN107579944A - Based on artificial intelligence and MapReduce security attack Forecasting Methodologies

Info

Publication number: CN107579944A
Application number: CN201610518915.0A
Authority: CN
Inventors: 李木金; 凌飞
Original assignee: Nanjing Liancheng Science And Technology Development Ltd By Share Ltd
Current assignee: Nanjing Liancheng Science And Technology Development Ltd By Share Ltd
Priority date: 2016-07-05
Filing date: 2016-07-05
Publication date: 2018-01-12
Anticipated expiration: 2036-07-05
Also published as: CN107579944B

Abstract

The invention discloses based on artificial intelligence and MapReduce security attack Forecasting Methodologies, including：Including establishing the security incident correlation model based on artificial intelligence technology by RF methods, the model is set up in real time with online mode by Hadoop/Spark big data frameworks.By the present invention by analyzing daily record caused by Enterprise IT System, the safety that may be sent can be predicted, enterprise can be helped timely（Or in advance）Evade the risk of security attack generation, ensure the normal operation of enterprise, reduce operation cost.

Description

Based on artificial intelligence and MapReduce security attack Forecasting Methodologies

Technical field

The present invention relates to artificial intelligence, big data and information security applied technical field, more particularly to security attack thing The method of part prediction.

Background technology

The English abbreviation included in the present invention is as follows：

RF：Random Forest random forests

CLF：Common Log Format generic log forms

JSON：JavaScript Object Notation JAVA scripting object symbols

SOC：Security Operation Center security management centers

IDS：Intrusion Detection Systems intruding detection systems

SNMP：Simple Network Management Protocol Simple Network Management Protocols

HDFS：Hadoop Distribute File SystemHadoop distributed file systems.

Safety in production is always to ensure the premise that work in every is carried out in order, and the rejection of examination leading cadres at various levels refers to Mark.Network and information security operation and maintenance system is the important component of all kinds of enterprise safety operation work.Logistics networks and information System high efficiency is stably run, and is the basis of all market management activities of enterprise and normal operation.

Currently, Enterprise IT System deploys a variety of operation systems and safety means, effectively raises work life Yield, reduce operation cost, have become enterprise's high efficiency operation important support and production link in an indispensable ring. On the one hand, once security incident or failure occurs in each operation system, it is impossible to timely discovery, timely processing, recover in time, Shi Bizhi The operation for causing to be carried thereon all business is connect, the normal operation order of enterprise is influenceed, is related to service enterprise of institute user's System will result directly in customer complaint, and satisfaction declines, and corporate image suffers damage, just shown for the safety guarantee of enterprise network Obtain increasingly important；On the other hand, various cyber-attack techniques also become increasingly advanced, increasingly universalness, the network of enterprise Systems face the danger attacked at any time, frequently suffer from different degrees of invasion and destruction, severe jamming enterprise network Normal operation.Increasingly serious security threat forces enterprise to have to strengthen the security protection to network system, constantly pursues more The security defensive system of level, three-dimensional, building security O＆M service centre, real-time tracking and predict various security attacks and When take corresponding control action, to protect these operation system normal operations of enterprise.

However, it is used to perform various equipment, database, middleware, operating system and the Web of safe O＆M service role Daily record caused by server etc., with the continuous expansion of Enterprise IT System scale, its type and quantity just experienced huge rule The rising of mould, so that daily record storage, log analysis and issue track become more and more difficult.The scale of Web logs of Enterprise IT System So exponentially increase, force safe O＆M service provider using big data framework as Hadoop/Spark to enter Row daily record storage, log processing and log analysis.

Current existing safety management analysis tool, can not be competent at the task of the safe O＆M service of enterprise.Therefore, compel It is essential and wants a kind of brand-new theory to analyze and manage massive logs information.Daily record is typically a kind of flat file, Including at least a timestamp, event ID and event description information.The rising of scale of Web logs is three large attributes of big data One of, two other attribute of big data is speed and species.Speed represents the speed of generation data, species expression very isomery Data source.

Therefore, how to improve the operation benefits of enterprise using information-based means, optimize enterprise information system so that it can Specialty and high performance-price ratio information safety operation and maintenance service is provided for all kinds of enterprises, becomes especially information safety operation and maintenance management The important topic solved is had in design.

The content of the invention

The present invention is analyze above-mentioned all kinds of enterprise information security operation management platforms the defects of and after deficiency, it is proposed that One kind is based on artificial intelligence and MapReduce security attacks Forecasting Methodology and system.

The present invention core concept be：One Forecasting Methodology and system for being used for security attack of structure.Methods described and it is System can establish the security incident correlation model based on artificial intelligence technology by daily record, and the model is passed through with online mode Hadoop/Spark big data frameworks are set up in real time.

Further, methods described and system, including RF methods, can be by establishing security attack to log analysis Correlation models.

Further, methods described and system, the daily record in Hadoop HDFS will be stored in, be programmed into by Python Row pretreatment, becomes JSON forms.

Further, methods described and system, using MapReduce programming model, JSON is as defeated in the Map stages Enter, security attack is detected from daily record.

Further, methods described and system, the number of each alarm and its appearance is counted in the Reduce stages.

Further, methods described and system, the security attack being likely to occur by RF model predictions.

The present invention by analyzing daily record caused by Enterprise IT System, can predict may transmission safety, can To help enterprise timely（Or in advance）Evade the risk of security attack generation, ensure enterprise normal operation, reduce operation into This.

Brief description of the drawings

Fig. 1 is the schematic diagram that original log form of the present invention is converted into JSON；

Fig. 2 is the Main Stage of the present invention based on artificial intelligence analysis's technology；

Fig. 3 is the Main Stage of the present invention based on big data framework；

Fig. 4 is the model that Forecast attack of the present invention may occur.

Embodiment

Here is the further description to the present invention with reference to the accompanying drawings with example：

The method that this patent is provided is made up of two parts.First part mainly carries from non-structural journal file The log information of structuring is taken, and journal file is pre-processed, becomes JSON file formats, JSON variables just correspond to Variable in daily record.The second part is included by JSON variable storages in Hadoop HDFS, and is used as MapReduce frameworks Input data.The detected attack in daily record, such as A-Cross Site Scripting (XSS), B- Injection Flaws, C-Insecure Direct Object Reference, D-InformationLeakage and Improper Error Handling etc..

The method that this patent is provided, from the non-structured journal file of specification（Generic log form（CLF）Specification）Open Begin.Non-structured daily record data is retrieved, with further daily record storage and log processing.Data are extracted from daily record to have grown up For a quite arduous technical assignment, because it must handle the daily record data of various heterogeneous formats.Realize one it is appropriate The extraction of daily record data, this patent select Python programming languages, are efficiency and relatively easily processing point because its flexibility Analysis task.In Python programs, using pyparsing, a useful class libraries, can directly it be constructed in Python code Syntax analyzer.

In the work of this patent, the result in this log integrity stage is a JSON（JavaScript Object Notation）File, it includes the variable corresponding to log field, as shown in Figure 1.JSON be one be easy to computer analysis and The light-duty data interchange language used.Language is exchanged compared to other structural datas（Such as XML）, JSON performance boosts are bright It is aobvious, fast 100 times of its resolution speed.It is a kind of for the thing for the correlation attack for finding and detecting in daily record based on RF methods The artificial intelligence technology of part association.The artificial intelligence technology that this patent is provided, it is to be based on two binary data structures, with And the parser for security attack frequency.The task of this artificial intelligence technology can be decomposed into：

1st, number/frequency that all attacks occur is searched in daily record；

2nd, frequency of use code is associated with generating with other attacks.

Fig. 2 shows three Main Stages using artificial intelligence technology.In order that discussion is apparent, based on binary Data Structure and Algorithm will use MapReduce big data framework.

In order to analyze the security attack occurrence frequency detected in daily record, this patent provides the people based on big data Work intellectual technology.Methods described is handled the JSON data, and creates two data structures, and one each for storing The title of security attack（That is attName）, another is used for storing what each attack detected and combinations thereof was attacked Number/frequency（That is attFreq）.

Fig. 3 gives the three phases of the big data framework：Pretreatment stage, MAP stages, Reduce stages：

1st, pretreatment stage（First stage）：In this stage, two data structures attName and attFreq will be created. The size of attFreq arrays depends on the attack quantity n having been detected by.For example, n=5, then the size of attFreq arrays is：==32, the combination occurred corresponding to the possibility of 5 attacks.

Assuming that the position that attack A, B and C are stored in array attName is respectively 1,2 and 3, if sent out in daily record Existing two kinds of attacks of A and C, then index of this combination in array attFreq is 5, and it is determined by Binary Conversion.This In the case of, A and C are 101 in binary system, are exactly binary value of the decimal system 5.So, attFreq index by it is following Lai Determine：+=5。

2nd, the MAP stages（Second stage）：In this stage, the JSON variables being stored in by scanning in HDFS, start to perform Algorithm（For example, intelligent algorithm）.By the way that JSON variables are compared with a series of special regular expressions（I.e.：Attack Detection model）, it is the series of features for identifying different attack modes, to detect various attacks.For being examined in daily record The each attack measured, corresponding ID can be found in attName, and the ID is used for determining in formula below accordingly AttFreq indexes, that is, it is named as ' Loc ', wherein i is the attack index in attName.

Loc=

Following the arthmetic statement overall process in MAP stages, wherein i are the indexes that current attack is stored in array attName. The output in MAP stages is exactly a key-value pair（key-value）：AttFreq indexes and frequency（This value will be the Reduce stages Input）：

Begin

loc←0

For each i in attName

If i is detected in log record

loc← loc + 2 i

End if

End for

Output [loc, 1]

End

3rd, the Reduce stages（Phase III）：In this stage, output of the Hadoop working nodes based on the MAP stages will again Distribute data.Then, Reduce methods concurrently will perform add operation in the data of each MAP outputs.Array Using as the result after storage Reduce methods execution, frequency can be ranked up attFreq by it, and can be by array Index be ranked up according to order from high to low.

Finally, find and initiate frequency highest attack and combinations thereof, and by its index translation into binary system.If for example, Index 10 is the attack most frequently occurred found in daily record in array attFreq, then can be through transitions into binary system To calculate the combination of attack：1010, and show that the combination by indexing 10 attacks represented is B and D.

This patent provides a kind of RF algorithms, the i.e. algorithm of random forest, to determine the association between attack.Consider rule A=>B, this is supported (support) by appellation, represents two attack A and B dependence, it is meant that in same affairs or identical Timeslice within, two attack A and B occur simultaneously.P (A ∩ B) represents the probability that A and B two attacks occur simultaneously, i.e., under Formula is set up：

Supp{A,B}= P(A∩B)

Conf{A=>B } two degree of beliefs for attacking A and B are represented, it is the index of a degree of accuracy, represents that attack A has been detected Arrive, and attack the probability of B generations.

Conf{A=>B}=

Fig. 4 is the utilization correlation rule determined between attack and the embodiment for predicting imminent attack." thing Business " refers to the index of JSON variables, and the attack of one of them or more is detected, and the number that " frequency " attack occurs.Example Such as, in affairs 3, A and B are the attacks detected in same JSON variables.

In the present embodiment, it is necessary to calculate support（Supp{X}）：

Supp{X}=。

Wherein n is the quantity of affairs, thenValue is 1（If there occurs attack X in Current transaction）；Otherwise,Value For 0.Represent the frequency that the attack of the i-th transaction journal is sent.

In the present embodiment, by being calculated：Supp { B, C }=21/1183, Conf B=>C} = 0.21.Assuming that B It is Injection Flaws, and C is Insecure Direct Object Reference；If the attack detected After Injection Flaws, attack Insecure Direct Object Reference by have 21% probability.

Presently preferred embodiments of the present invention is the foregoing is only, is not used for limiting the practical range of the present invention；It is every according to this The made equivalence changes of invention and modification, the scope of the claims for being considered as the present invention are covered.

Claims

1. the invention provides one kind to be based on artificial intelligence and MapReduce security attack Forecasting Methodologies, including passes through RF methods The security incident correlation model based on artificial intelligence technology is established, the model is big by Hadoop/Spark with online mode What data framework was set up in real time.

2. one kind as claimed in claim 1 is based on artificial intelligence and MapReduce security attack Forecasting Methodologies, methods described will Original log, pre-processed by Python, become JSON forms.

3. one kind as claimed in claim 2 is adopted based on artificial intelligence and MapReduce security attack Forecasting Methodologies, methods described With MapReduce programming model, JSON files detect security attack as its input in the Map stages from daily record.

4. one kind as claimed in claim 3 is existed based on artificial intelligence and MapReduce security attack Forecasting Methodologies, methods described The Reduce stages count the number of each alarm and its appearance.

5. one kind as claimed in claim 4 is led to based on artificial intelligence and MapReduce security attack Forecasting Methodologies, methods described Cross the security attack that RF model predictions are likely to occur.