CN107579944A - Based on artificial intelligence and MapReduce security attack Forecasting Methodologies - Google Patents

Based on artificial intelligence and MapReduce security attack Forecasting Methodologies Download PDF

Info

Publication number
CN107579944A
CN107579944A CN201610518915.0A CN201610518915A CN107579944A CN 107579944 A CN107579944 A CN 107579944A CN 201610518915 A CN201610518915 A CN 201610518915A CN 107579944 A CN107579944 A CN 107579944A
Authority
CN
China
Prior art keywords
artificial intelligence
mapreduce
security attack
security
attack
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610518915.0A
Other languages
Chinese (zh)
Other versions
CN107579944B (en
Inventor
李木金
凌飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Liancheng Science And Technology Development Ltd By Share Ltd
Original Assignee
Nanjing Liancheng Science And Technology Development Ltd By Share Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Liancheng Science And Technology Development Ltd By Share Ltd filed Critical Nanjing Liancheng Science And Technology Development Ltd By Share Ltd
Priority to CN201610518915.0A priority Critical patent/CN107579944B/en
Publication of CN107579944A publication Critical patent/CN107579944A/en
Application granted granted Critical
Publication of CN107579944B publication Critical patent/CN107579944B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Debugging And Monitoring (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention discloses based on artificial intelligence and MapReduce security attack Forecasting Methodologies, including:Including establishing the security incident correlation model based on artificial intelligence technology by RF methods, the model is set up in real time with online mode by Hadoop/Spark big data frameworks.By the present invention by analyzing daily record caused by Enterprise IT System, the safety that may be sent can be predicted, enterprise can be helped timely(Or in advance)Evade the risk of security attack generation, ensure the normal operation of enterprise, reduce operation cost.

Description

Based on artificial intelligence and MapReduce security attack Forecasting Methodologies
Technical field
The present invention relates to artificial intelligence, big data and information security applied technical field, more particularly to security attack thing The method of part prediction.
Background technology
The English abbreviation included in the present invention is as follows:
RF:Random Forest random forests
CLF:Common Log Format generic log forms
JSON:JavaScript Object Notation JAVA scripting object symbols
SOC:Security Operation Center security management centers
IDS:Intrusion Detection Systems intruding detection systems
SNMP:Simple Network Management Protocol Simple Network Management Protocols
HDFS:Hadoop Distribute File SystemHadoop distributed file systems.
Safety in production is always to ensure the premise that work in every is carried out in order, and the rejection of examination leading cadres at various levels refers to Mark.Network and information security operation and maintenance system is the important component of all kinds of enterprise safety operation work.Logistics networks and information System high efficiency is stably run, and is the basis of all market management activities of enterprise and normal operation.
Currently, Enterprise IT System deploys a variety of operation systems and safety means, effectively raises work life Yield, reduce operation cost, have become enterprise's high efficiency operation important support and production link in an indispensable ring. On the one hand, once security incident or failure occurs in each operation system, it is impossible to timely discovery, timely processing, recover in time, Shi Bizhi The operation for causing to be carried thereon all business is connect, the normal operation order of enterprise is influenceed, is related to service enterprise of institute user's System will result directly in customer complaint, and satisfaction declines, and corporate image suffers damage, just shown for the safety guarantee of enterprise network Obtain increasingly important;On the other hand, various cyber-attack techniques also become increasingly advanced, increasingly universalness, the network of enterprise Systems face the danger attacked at any time, frequently suffer from different degrees of invasion and destruction, severe jamming enterprise network Normal operation.Increasingly serious security threat forces enterprise to have to strengthen the security protection to network system, constantly pursues more The security defensive system of level, three-dimensional, building security O&M service centre, real-time tracking and predict various security attacks and When take corresponding control action, to protect these operation system normal operations of enterprise.
However, it is used to perform various equipment, database, middleware, operating system and the Web of safe O&M service role Daily record caused by server etc., with the continuous expansion of Enterprise IT System scale, its type and quantity just experienced huge rule The rising of mould, so that daily record storage, log analysis and issue track become more and more difficult.The scale of Web logs of Enterprise IT System So exponentially increase, force safe O&M service provider using big data framework as Hadoop/Spark to enter Row daily record storage, log processing and log analysis.
Current existing safety management analysis tool, can not be competent at the task of the safe O&M service of enterprise.Therefore, compel It is essential and wants a kind of brand-new theory to analyze and manage massive logs information.Daily record is typically a kind of flat file, Including at least a timestamp, event ID and event description information.The rising of scale of Web logs is three large attributes of big data One of, two other attribute of big data is speed and species.Speed represents the speed of generation data, species expression very isomery Data source.
Therefore, how to improve the operation benefits of enterprise using information-based means, optimize enterprise information system so that it can Specialty and high performance-price ratio information safety operation and maintenance service is provided for all kinds of enterprises, becomes especially information safety operation and maintenance management The important topic solved is had in design.
The content of the invention
The present invention is analyze above-mentioned all kinds of enterprise information security operation management platforms the defects of and after deficiency, it is proposed that One kind is based on artificial intelligence and MapReduce security attacks Forecasting Methodology and system.
The present invention core concept be:One Forecasting Methodology and system for being used for security attack of structure.Methods described and it is System can establish the security incident correlation model based on artificial intelligence technology by daily record, and the model is passed through with online mode Hadoop/Spark big data frameworks are set up in real time.
Further, methods described and system, including RF methods, can be by establishing security attack to log analysis Correlation models.
Further, methods described and system, the daily record in Hadoop HDFS will be stored in, be programmed into by Python Row pretreatment, becomes JSON forms.
Further, methods described and system, using MapReduce programming model, JSON is as defeated in the Map stages Enter, security attack is detected from daily record.
Further, methods described and system, the number of each alarm and its appearance is counted in the Reduce stages.
Further, methods described and system, the security attack being likely to occur by RF model predictions.
The present invention by analyzing daily record caused by Enterprise IT System, can predict may transmission safety, can To help enterprise timely(Or in advance)Evade the risk of security attack generation, ensure enterprise normal operation, reduce operation into This.
Brief description of the drawings
Fig. 1 is the schematic diagram that original log form of the present invention is converted into JSON;
Fig. 2 is the Main Stage of the present invention based on artificial intelligence analysis's technology;
Fig. 3 is the Main Stage of the present invention based on big data framework;
Fig. 4 is the model that Forecast attack of the present invention may occur.
Embodiment
Here is the further description to the present invention with reference to the accompanying drawings with example:
The method that this patent is provided is made up of two parts.First part mainly carries from non-structural journal file The log information of structuring is taken, and journal file is pre-processed, becomes JSON file formats, JSON variables just correspond to Variable in daily record.The second part is included by JSON variable storages in Hadoop HDFS, and is used as MapReduce frameworks Input data.The detected attack in daily record, such as A-Cross Site Scripting (XSS), B- Injection Flaws, C-Insecure Direct Object Reference, D-InformationLeakage and Improper Error Handling etc..
The method that this patent is provided, from the non-structured journal file of specification(Generic log form(CLF)Specification)Open Begin.Non-structured daily record data is retrieved, with further daily record storage and log processing.Data are extracted from daily record to have grown up For a quite arduous technical assignment, because it must handle the daily record data of various heterogeneous formats.Realize one it is appropriate The extraction of daily record data, this patent select Python programming languages, are efficiency and relatively easily processing point because its flexibility Analysis task.In Python programs, using pyparsing, a useful class libraries, can directly it be constructed in Python code Syntax analyzer.
In the work of this patent, the result in this log integrity stage is a JSON(JavaScript Object Notation)File, it includes the variable corresponding to log field, as shown in Figure 1.JSON be one be easy to computer analysis and The light-duty data interchange language used.Language is exchanged compared to other structural datas(Such as XML), JSON performance boosts are bright It is aobvious, fast 100 times of its resolution speed.It is a kind of for the thing for the correlation attack for finding and detecting in daily record based on RF methods The artificial intelligence technology of part association.The artificial intelligence technology that this patent is provided, it is to be based on two binary data structures, with And the parser for security attack frequency.The task of this artificial intelligence technology can be decomposed into:
1st, number/frequency that all attacks occur is searched in daily record;
2nd, frequency of use code is associated with generating with other attacks.
Fig. 2 shows three Main Stages using artificial intelligence technology.In order that discussion is apparent, based on binary Data Structure and Algorithm will use MapReduce big data framework.
In order to analyze the security attack occurrence frequency detected in daily record, this patent provides the people based on big data Work intellectual technology.Methods described is handled the JSON data, and creates two data structures, and one each for storing The title of security attack(That is attName), another is used for storing what each attack detected and combinations thereof was attacked Number/frequency(That is attFreq).
Fig. 3 gives the three phases of the big data framework:Pretreatment stage, MAP stages, Reduce stages:
1st, pretreatment stage(First stage):In this stage, two data structures attName and attFreq will be created. The size of attFreq arrays depends on the attack quantity n having been detected by.For example, n=5, then the size of attFreq arrays is:==32, the combination occurred corresponding to the possibility of 5 attacks.
Assuming that the position that attack A, B and C are stored in array attName is respectively 1,2 and 3, if sent out in daily record Existing two kinds of attacks of A and C, then index of this combination in array attFreq is 5, and it is determined by Binary Conversion.This In the case of, A and C are 101 in binary system, are exactly binary value of the decimal system 5.So, attFreq index by it is following Lai Determine:+=5。
2nd, the MAP stages(Second stage):In this stage, the JSON variables being stored in by scanning in HDFS, start to perform Algorithm(For example, intelligent algorithm).By the way that JSON variables are compared with a series of special regular expressions(I.e.:Attack Detection model), it is the series of features for identifying different attack modes, to detect various attacks.For being examined in daily record The each attack measured, corresponding ID can be found in attName, and the ID is used for determining in formula below accordingly AttFreq indexes, that is, it is named as ' Loc ', wherein i is the attack index in attName.
Loc=
Following the arthmetic statement overall process in MAP stages, wherein i are the indexes that current attack is stored in array attName. The output in MAP stages is exactly a key-value pair(key-value):AttFreq indexes and frequency(This value will be the Reduce stages Input):
Begin
loc←0
For each i in attName
If i is detected in log record
loc← loc + 2 i
End if
End for
Output [loc, 1]
End
3rd, the Reduce stages(Phase III):In this stage, output of the Hadoop working nodes based on the MAP stages will again Distribute data.Then, Reduce methods concurrently will perform add operation in the data of each MAP outputs.Array Using as the result after storage Reduce methods execution, frequency can be ranked up attFreq by it, and can be by array Index be ranked up according to order from high to low.
Finally, find and initiate frequency highest attack and combinations thereof, and by its index translation into binary system.If for example, Index 10 is the attack most frequently occurred found in daily record in array attFreq, then can be through transitions into binary system To calculate the combination of attack:1010, and show that the combination by indexing 10 attacks represented is B and D.
This patent provides a kind of RF algorithms, the i.e. algorithm of random forest, to determine the association between attack.Consider rule A=>B, this is supported (support) by appellation, represents two attack A and B dependence, it is meant that in same affairs or identical Timeslice within, two attack A and B occur simultaneously.P (A ∩ B) represents the probability that A and B two attacks occur simultaneously, i.e., under Formula is set up:
Supp{A,B}= P(A∩B)
Conf{A=>B } two degree of beliefs for attacking A and B are represented, it is the index of a degree of accuracy, represents that attack A has been detected Arrive, and attack the probability of B generations.
Conf{A=>B}=
Fig. 4 is the utilization correlation rule determined between attack and the embodiment for predicting imminent attack." thing Business " refers to the index of JSON variables, and the attack of one of them or more is detected, and the number that " frequency " attack occurs.Example Such as, in affairs 3, A and B are the attacks detected in same JSON variables.
In the present embodiment, it is necessary to calculate support(Supp{X}):
Supp{X}=
Wherein n is the quantity of affairs, thenValue is 1(If there occurs attack X in Current transaction);Otherwise,Value For 0.Represent the frequency that the attack of the i-th transaction journal is sent.
In the present embodiment, by being calculated:Supp { B, C }=21/1183, Conf B=>C} = 0.21.Assuming that B It is Injection Flaws, and C is Insecure Direct Object Reference;If the attack detected After Injection Flaws, attack Insecure Direct Object Reference by have 21% probability.
Presently preferred embodiments of the present invention is the foregoing is only, is not used for limiting the practical range of the present invention;It is every according to this The made equivalence changes of invention and modification, the scope of the claims for being considered as the present invention are covered.

Claims (5)

1. the invention provides one kind to be based on artificial intelligence and MapReduce security attack Forecasting Methodologies, including passes through RF methods The security incident correlation model based on artificial intelligence technology is established, the model is big by Hadoop/Spark with online mode What data framework was set up in real time.
2. one kind as claimed in claim 1 is based on artificial intelligence and MapReduce security attack Forecasting Methodologies, methods described will Original log, pre-processed by Python, become JSON forms.
3. one kind as claimed in claim 2 is adopted based on artificial intelligence and MapReduce security attack Forecasting Methodologies, methods described With MapReduce programming model, JSON files detect security attack as its input in the Map stages from daily record.
4. one kind as claimed in claim 3 is existed based on artificial intelligence and MapReduce security attack Forecasting Methodologies, methods described The Reduce stages count the number of each alarm and its appearance.
5. one kind as claimed in claim 4 is led to based on artificial intelligence and MapReduce security attack Forecasting Methodologies, methods described Cross the security attack that RF model predictions are likely to occur.
CN201610518915.0A 2016-07-05 2016-07-05 Artificial intelligence and MapReduce-based security attack prediction method Active CN107579944B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610518915.0A CN107579944B (en) 2016-07-05 2016-07-05 Artificial intelligence and MapReduce-based security attack prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610518915.0A CN107579944B (en) 2016-07-05 2016-07-05 Artificial intelligence and MapReduce-based security attack prediction method

Publications (2)

Publication Number Publication Date
CN107579944A true CN107579944A (en) 2018-01-12
CN107579944B CN107579944B (en) 2020-08-11

Family

ID=61049851

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610518915.0A Active CN107579944B (en) 2016-07-05 2016-07-05 Artificial intelligence and MapReduce-based security attack prediction method

Country Status (1)

Country Link
CN (1) CN107579944B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107623655A (en) * 2016-07-13 2018-01-23 南京联成科技发展股份有限公司 The system for detecting attack in real time based on artificial intelligence and MapReduce
CN110611636A (en) * 2018-06-14 2019-12-24 蓝盾信息安全技术股份有限公司 Major data algorithm-based defect host detection technology
CN111752566A (en) * 2019-03-28 2020-10-09 上海视九信息科技有限公司 Method and device for analyzing function expression in compiled language environment, storage medium and terminal
CN113297296A (en) * 2021-05-31 2021-08-24 西南大学 JSON processing method for multi-style type data

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102915347A (en) * 2012-09-26 2013-02-06 中国信息安全测评中心 Distributed data stream clustering method and system
CN104268254A (en) * 2014-10-09 2015-01-07 浪潮电子信息产业股份有限公司 Security situation analysis and statistics method
CN104363222A (en) * 2014-11-11 2015-02-18 浪潮电子信息产业股份有限公司 Hadoop-based network security event analysis method
CN104394211A (en) * 2014-11-21 2015-03-04 浪潮电子信息产业股份有限公司 Hadoop-based user behavior analysis system design and implementation method
CN104850780A (en) * 2015-04-27 2015-08-19 北京北信源软件股份有限公司 Discrimination method for advanced persistent threat attack
CN105138661A (en) * 2015-09-02 2015-12-09 西北大学 Hadoop-based k-means clustering analysis system and method of network security log
CN105207826A (en) * 2015-10-26 2015-12-30 南京联成科技发展有限公司 Security attack alarm positioning system based on Spark big data platform of Tachyou
CN105391742A (en) * 2015-12-18 2016-03-09 桂林电子科技大学 Hadoop-based distributed intrusion detection system
CN105677615A (en) * 2016-01-04 2016-06-15 北京邮电大学 Distributed machine learning method based on weka interface

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102915347A (en) * 2012-09-26 2013-02-06 中国信息安全测评中心 Distributed data stream clustering method and system
CN104268254A (en) * 2014-10-09 2015-01-07 浪潮电子信息产业股份有限公司 Security situation analysis and statistics method
CN104363222A (en) * 2014-11-11 2015-02-18 浪潮电子信息产业股份有限公司 Hadoop-based network security event analysis method
CN104394211A (en) * 2014-11-21 2015-03-04 浪潮电子信息产业股份有限公司 Hadoop-based user behavior analysis system design and implementation method
CN104850780A (en) * 2015-04-27 2015-08-19 北京北信源软件股份有限公司 Discrimination method for advanced persistent threat attack
CN105138661A (en) * 2015-09-02 2015-12-09 西北大学 Hadoop-based k-means clustering analysis system and method of network security log
CN105207826A (en) * 2015-10-26 2015-12-30 南京联成科技发展有限公司 Security attack alarm positioning system based on Spark big data platform of Tachyou
CN105391742A (en) * 2015-12-18 2016-03-09 桂林电子科技大学 Hadoop-based distributed intrusion detection system
CN105677615A (en) * 2016-01-04 2016-06-15 北京邮电大学 Distributed machine learning method based on weka interface

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107623655A (en) * 2016-07-13 2018-01-23 南京联成科技发展股份有限公司 The system for detecting attack in real time based on artificial intelligence and MapReduce
CN110611636A (en) * 2018-06-14 2019-12-24 蓝盾信息安全技术股份有限公司 Major data algorithm-based defect host detection technology
CN110611636B (en) * 2018-06-14 2021-12-14 蓝盾信息安全技术股份有限公司 Major data algorithm-based defect host detection method
CN111752566A (en) * 2019-03-28 2020-10-09 上海视九信息科技有限公司 Method and device for analyzing function expression in compiled language environment, storage medium and terminal
CN111752566B (en) * 2019-03-28 2024-03-22 上海视九信息科技有限公司 Analysis method and device for function expression in compiled language environment, storage medium and terminal
CN113297296A (en) * 2021-05-31 2021-08-24 西南大学 JSON processing method for multi-style type data
CN113297296B (en) * 2021-05-31 2022-08-16 西南大学 JSON processing method for multi-style type data

Also Published As

Publication number Publication date
CN107579944B (en) 2020-08-11

Similar Documents

Publication Publication Date Title
US11973777B2 (en) Knowledge graph for real time industrial control system security event monitoring and management
Khan et al. HML-IDS: A hybrid-multilevel anomaly prediction approach for intrusion detection in SCADA systems
EP3205072B1 (en) Differential dependency tracking for attack forensics
CN109902297B (en) Threat information generation method and device
CN115996146B (en) Numerical control system security situation sensing and analyzing system, method, equipment and terminal
CN107579944A (en) Based on artificial intelligence and MapReduce security attack Forecasting Methodologies
Ahmad et al. Role of machine learning and data mining in internet security: standing state with future directions
Al-Ghuwairi et al. Intrusion detection in cloud computing based on time series anomalies utilizing machine learning
US10262133B1 (en) System and method for contextually analyzing potential cyber security threats
Chen et al. A security, privacy and trust methodology for IIoT
Xue et al. Prediction of computer network security situation based on association rules mining
Chen et al. A management knowledge graph approach for critical infrastructure protection: Ontology design, information extraction and relation prediction
CN116074092B (en) Attack scene reconstruction system based on heterogram attention network
CN117118857A (en) Knowledge graph-based network security threat management system and method
Qu et al. Instruction detection in scada/modbus network based on machine learning
KR101608221B1 (en) System and method of sensing cyber threat using database access pattern
CN116186712A (en) Terminal open source software safety detection and early warning method, system, equipment and terminal
Yu et al. An approach to failure prediction in cluster by self-updating cause-and-effect graph
Peng et al. Research on abnormal detection technology of real-time interaction process in new energy network
Naukudkar et al. Enhancing performance of security log analysis using correlation-prediction technique
Xie et al. A pvalue-guided anomaly detection approach combining multiple heterogeneous log parser algorithms on IIoT systems
CN107623655A (en) The system for detecting attack in real time based on artificial intelligence and MapReduce
CN117857182B (en) Processing method and device for server abnormal access
Deng et al. Analysis and prediction of network connection behavior anomaly based on knowledge graph features
Beattie Detecting temporal anomalies in time series data utilizing the matrix profile

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Security attack prediction method based on artificial intelligence and MapReduce

Effective date of registration: 20210524

Granted publication date: 20200811

Pledgee: Bank of Jiangsu Limited by Share Ltd. Nanjing Jiangning branch

Pledgor: NANJING LIANCHENG TECHNOLOGY DEVELOPMENT Co.,Ltd.

Registration number: Y2021980003928

PE01 Entry into force of the registration of the contract for pledge of patent right
PC01 Cancellation of the registration of the contract for pledge of patent right

Date of cancellation: 20220329

Granted publication date: 20200811

Pledgee: Bank of Jiangsu Limited by Share Ltd. Nanjing Jiangning branch

Pledgor: NANJING LIANCHENG TECHNOLOGY DEVELOPMENT CO.,LTD.

Registration number: Y2021980003928

PC01 Cancellation of the registration of the contract for pledge of patent right