CN110719253A - Web honeypot system based on intelligence question-answering - Google Patents

Web honeypot system based on intelligence question-answering Download PDF

Info

Publication number
CN110719253A
CN110719253A CN201910807155.9A CN201910807155A CN110719253A CN 110719253 A CN110719253 A CN 110719253A CN 201910807155 A CN201910807155 A CN 201910807155A CN 110719253 A CN110719253 A CN 110719253A
Authority
CN
China
Prior art keywords
attack
model
web
attacker
answering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910807155.9A
Other languages
Chinese (zh)
Inventor
黄诚
方勇
龙啸
高健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN201910807155.9A priority Critical patent/CN110719253A/en
Publication of CN110719253A publication Critical patent/CN110719253A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1433Vulnerability analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1491Countermeasures against malicious traffic using deception as countermeasure, e.g. honeypots, honeynets, decoys or entrapment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/20Network architectures or network communication protocols for network security for managing network security; network security policies in general
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Hardware Design (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention relates to a self-adaptive Web honeypot system based on an intelligent question-answering technology, which is used for capturing a deep complete attack chain. By using an intelligent question-answering technology, the system consists of three main algorithm models, and an LSTM model based on an attention mechanism carries out problem analysis on an attacker message and extracts a key attack vector in the attacker message; dynamically generating deception feedback according to the attack vector by a sensitive information counterfeiting model based on SeqGAN; the Web application learning model based on external observation utilizes an external observation mode to adaptively learn application characteristics. The honeypot system replaces traditional honeypot deception with context semantics, and can perform generative intelligent question answering response.

Description

Web honeypot system based on intelligence question-answering
Technical Field
The invention relates to a self-adaptive WEB honeypot system based on an intelligent question-answering technology, which is used for capturing a deep complete attack chain. By using an intelligent question-answering technology, the system consists of three main algorithm models, and an LSTM model based on an attention mechanism carries out problem analysis on an attacker message and extracts a key attack vector in the attacker message; dynamically generating deception feedback according to the attack vector by a sensitive information counterfeiting model based on SeqGAN; the Web application learning model based on external observation utilizes an external observation mode to adaptively learn application characteristics. The honeypot system replaces traditional honeypot deception with context semantics, and can perform generative intelligent question answering response.
Background
Network space has become an important factor affecting national security, international relationships, and political constellations. The internet and the network space formed by applications, services, data and the like carried by the internet are changing the way people live. When the method is integrated into every link of the society, the risk of information leakage of individuals, enterprises and even governments and countries is increased.
In order to deal with security threats in a network space, different security products are introduced in the industry and the academia, the traditional security products such as a firewall, an intrusion detection and defense system, a network behavior management system, a security scanning tool, a vulnerability auditing tool and the like cannot fundamentally turn around the dilemma of unbalanced attack and defense games, and the security products can only discover and repair security problems existing in the system and the network and cannot help a defensive party to acquire complete attack chain information of attackers.
Honeypots and their derived spoofing are just tools for changing such asymmetric attack and defense conditions. Due to this characteristic, the mechanism design and practical application of honeypots have been the focus of security researchers. However, the existing research results and engineering practices have many defects, and the high-interaction honeypots can attract deeper attacks, but have the defects of high deployment risk, complex deployment configuration and the like; the low-interaction honeypots can be deployed quickly, but only respond to attack requests simply. The deception technology research at the present stage mainly focuses on the deception angle of a protocol stack, and the deception research at a strategy level is lacked.
Aiming at the problems, the invention constructs a Web dynamic cheating feedback system based on an intelligent question-answering mechanism through relevant research in the field of migrating intelligent question-answering, solves the contradiction between honeypot interaction and safety, improves the cheating performance of Web honeypots, effectively improves the concealment performance of honeypots, and fully exerts the active defense capacity and the information acquisition capacity of a network cheating technology.
Disclosure of Invention
The invention relates to a Web dynamic deception feedback system based on an intelligent question-answering mechanism, which analyzes an attack request of an attacker through an LSTM module based on an attention mechanism, the analyzed request generates trapping information through a SeqGAN-based sensitive information generation module, the generated information is embedded into a webpage template generated by an external observation module, and a finally formed complete webpage is used for deceiving the attacker.
The inventive content includes the following aspects:
1) establishing a scanner cluster by using a vulnerability scanner of an open source community;
2) setting an HTTP proxy to capture all interactive messages of a scanner and a vulnerability page in a man-in-the-middle manner;
3) acquiring a Web vulnerability report in actual penetration;
4) combining an attack interaction message in an experimental environment and a Web vulnerability report in actual penetration to form an original question-answer corpus;
5) preprocessing an original question-answer corpus, including data duplication removal, special symbol cleaning, word segmentation, word embedding and the like;
6) specially processing the confused special symbols in the original question-answer corpus;
7) performing word segmentation on an original question-answer corpus based on a Web protocol;
8) performing Word embedding on the original question-answer corpus after Word segmentation by using a Gensim packet and using a Word2Vec algorithm;
9) finally forming a question and answer corpus by the above-mentioned method;
10) the semantic analysis technology based on the Web protocol utilizes the Web semantic and represents the semantic in the protocol through natural language coding;
11) extracting the representation semantic features and describing malicious attacks;
12) extracting a key attack vector in the attack request message by using an attention mechanism LSTM model;
13) based on a SeqGAN model, recognizing the semantics of an attack request, judging the attack effect which an attacker wants to achieve, and forging sensitive feedback information;
14) generating a model by a webpage template based on a network crawling technology, performing interactive access on a website to be protected by using a network crawler, and recording real website resources and response information;
15) aiming at the corresponding relation between the website path and the website webpage, constructing a website template and a routing mapping relation by using a relevant strategy;
16) and the dynamic honeypot generation model system is formed by an external observation learning mechanism, a key attack vector extraction model and a countermeasure generation model.
According to the method, the invention adopts the following technical scheme: a Web honeypot system based on intelligent question answering comprises five parts: data collection and preprocessing, Web application external observation simulation, attack abnormal point judgment, sensitive information generation and attack response generation. The invention provides a Web honeypot system based on intelligent question answering, which comprises the following functions:
1) utilizing each large open source scanner to establish a scanner cluster, collecting various Web vulnerability reports and collecting original question and answer corpora;
2) removing duplication and special symbols of the original question-answer corpus in an automatic and manual mode;
3) cleaning, word segmentation and other operations;
4) performing Word embedding on the processed original question-answer corpus by using a Gensim packet and using a Word2vec algorithm to generate Word vectors;
5) learning route composition characteristics and page visual characteristics of the Web application in an external mode by using a Web crawler and a parser;
6) acquiring an attack vector, extracting the attack vector and generating feedback information with deception;
7) completing the generation of an attack question and answer to an attacker based on semantic information in a Web protocol;
8) and returning a feedback webpage formed by the sensitive information and the webpage template to the attacker according to the template generation strategy.
The intended object of the invention is as follows.
1) The honeypot model based on the intelligent question answering is provided, so that information can be fed back to the attack behavior of a malicious attacker well and dynamically;
2) collecting and obtaining a high-quality intelligent question and answer corpus;
3) extracting key abnormal vectors in the attack message by using an LSTM algorithm based on an attention mechanism;
4) on the basis of SeqGan, learning the characteristics from the attack sequence to the feedback sequence from the semantic perspective;
5) and constructing a corresponding honeypot system prototype, and improving the reliability and success rate of the Web honeypot system in dynamic feedback, adaptive configuration, visual deception and attack trapping.
Drawings
FIG. 1 is a schematic diagram of the model overall framework of the present invention.
FIG. 2 is a functional layout of the data collection of the present invention.
FIG. 3 is a functional layout diagram for data preprocessing of the present invention.
Fig. 4 is a functional design diagram of an attack anomaly determination module according to the present invention.
FIG. 5 is a sensitive information generating module algorithm plan of the present invention.
FIG. 6 is a functional diagram of an attack response module of the present invention.
Detailed Description
The honey pot system is formed by the following five modules, and the technical scheme in the embodiment of the application is clearly and completely described in the following with reference to the attached drawings in the embodiment of the application.
Honeypots and the network spoofing techniques they derive are asymmetric tools to change the attack and defense situation. The intelligent question-answering Web honeypot based on the semantics utilizes the context semantics to replace the traditional single attack vector recognition, can more correctly feed back the malicious behavior of an attacker according with the semantics, and can improve the reliability and the success rate of the Web honeypot system in dynamic feedback, self-adaptive configuration, visual deception and attack trapping. The specific technical scheme is as follows.
Fig. one is a schematic diagram of the overall framework of the model of the present invention, and details a related design based on an intelligent question-answering honeypot. As shown in fig. one, the method includes the following steps.
(1) Data collection and preprocessing module
The data collection and preprocessing module is responsible for corpus collection and vectorization embedding of the model and mainly comprises two parts, and the question and answer corpus is composed of attack interaction messages in an experimental environment and Web vulnerability reports in actual penetration. And collecting experimental environment data, hijacking scanning interaction between a scanner and a vulnerability page in an HTTP proxy mode, recording a request message and a response message corresponding to the request message, and forming a group of question and answer corpora by using an attack vector of the corresponding request message and sensitive information in the response message. And meanwhile, extracting corresponding question and answer corpora from the Web vulnerability analysis report by using a manual extraction mode.
(2) Web application external observation simulation module
The Web application external observation module is mainly responsible for the self-adaptive generation of the honeypot, and learns the route construction characteristics and the page visual characteristics of the Web application in an external mode, so that the honeypot framework can automatically and dynamically generate deceptive contents. The module is mainly composed of a Web crawler and a resolver, and firstly accesses a Web application to be protected through an external mode, extracts all hyperlinks conforming to a current domain name in the application in a recursive traversal mode, and finally stores the hyperlinks and an application program mapped by the hyperlinks. The stored hyperlink and the application message are analyzed by an analyzer, and the analyzer determines static resources and dynamic resources in the hyperlink according to the relation between the hyperlink and the webpage. The resolver will eventually generate an access routing table and corresponding web page templates.
(3) Attack abnormal point judging module
The attack abnormal point judging module is used for extracting specific abnormal information in the Web request of an attacker, and the attack abnormal point judging module mainly utilizes an LSTM algorithm improved by an attention mechanism to determine an abnormal attack point. The module mainly comprises two parts, namely training of the attention LSTM model and judgment and extraction of abnormal points. The model training uses a semi-supervised mechanism, takes a normal Web application request packet as a training sample, utilizes the LSTM to encode the sample, and simultaneously carries out parameter tuning on the algorithm model.
(4) Sensitive information generating module
The sensitive information generation module completes the generation of attack questions and answers to attackers based on semantic information in a Web protocol, trains a generating model through collected attack vectors and sensitive information linguistic data, and analyzes the hidden characteristic relation between an attack sequence and a feedback sequence through an improved generation countermeasure network. The sensitive information generation module mainly uses an improved SeqGAN algorithm to generate countermeasures, word embedding of the model is realized by using word2vec, coded and embedded data mainly comprises an attack word sequence and a sensitive feedback word sequence, a self-encoder model of the generator converts an input attack sequence, the converted sequence is compared with the sensitive word feedback sequence in a distribution characteristic mode, training initialized noise is continuously close to the distribution characteristic of the sensitive word feedback sequence, a discriminator identifies real and false data, and when a loss function is reduced to a certain threshold value, the model of the generator can be considered to generate corresponding sensitive feedback according to unknown attack input.
(5) Attack response generation module
The attack response generation module is responsible for aggregating functions of other modules and realizing complete logic operation of the attack vector receiving feedback webpage. The attack response generation module is mainly butted with the three modules, wherein the routing function receives an attack path of an attacker, similarity matching is carried out on the attack path and a routing table of an external observation module, and a resource non-existing state is returned for the access of a missed route; accesses to the hit route are passed to the outlier determination module for further processing. And the internal resource function receives the webpage template hitting the route and the generated fake sensitive information, replaces the placeholder in the template according to a template generation strategy, and returns a feedback webpage consisting of the sensitive information and the webpage template to an attacker to perform deception defense on the attacker.
The Web honeypot system based on the intelligent question answering is described in detail above.

Claims (5)

1. A Web honeypot system based on intelligent question answering is characterized by comprising the following steps: the method comprises the following steps: extracting a template of the application of the website to be protected and generating a mapping relation between the route and the template; step two: extracting the request of the attacker by using an attention mechanism LSTM model; step three: sensitive information expected to be obtained by attack of an attacker is counterfeited and generated by utilizing a SeqGAN model; step four: the three models jointly form a honeypot system, and adaptive cheating defense is performed on website applications.
2. The external viewing model of claim 1, wherein: the method comprises the steps of generating a strategy based on a character-level webpage difference template and a Ratcliff-Obershelp algorithm based on segmentation hashing, wherein the generated strategy comprises text variables and random hashing, which can be completely detected according to differences of webpage characters, and time variables are consistent in a plurality of webpages, so that a special regular expression is needed for matching, reflection variables need to be compared with the differences among the webpages and the same variables of the webpages and the webpage resource paths, and when the detection strategies are not hit, current variables are marked to be unknown variables.
3. The attention mechanism LSTM attack vector extraction model of claim 1, wherein: the attack vector in the attack message can be extracted by using an LSTM algorithm model improved by an attention mechanism.
4. The SeqGAN sensitive information generation model of claim 1, wherein: fusing the improved SeqGAN model of Seq2 Seq.
5. The three main modules of claim 1, together forming a honeypot system, capable of dynamically simulating any web application, generating a corresponding honeypot system, while dynamically spoofing according to the context semantics of an attacker.
CN201910807155.9A 2019-08-29 2019-08-29 Web honeypot system based on intelligence question-answering Pending CN110719253A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910807155.9A CN110719253A (en) 2019-08-29 2019-08-29 Web honeypot system based on intelligence question-answering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910807155.9A CN110719253A (en) 2019-08-29 2019-08-29 Web honeypot system based on intelligence question-answering

Publications (1)

Publication Number Publication Date
CN110719253A true CN110719253A (en) 2020-01-21

Family

ID=69209525

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910807155.9A Pending CN110719253A (en) 2019-08-29 2019-08-29 Web honeypot system based on intelligence question-answering

Country Status (1)

Country Link
CN (1) CN110719253A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111401067A (en) * 2020-03-18 2020-07-10 上海观安信息技术股份有限公司 Honeypot simulation data generation method and device
CN111756742A (en) * 2020-06-24 2020-10-09 广州锦行网络科技有限公司 Honeypot deception defense system and deception defense method thereof
CN112367315A (en) * 2020-11-03 2021-02-12 浙江大学 Endogenous safe WAF honeypot deployment method
CN112560438A (en) * 2020-11-27 2021-03-26 同济大学 Text generation method based on generation of confrontation network
CN113472761A (en) * 2021-06-22 2021-10-01 杭州默安科技有限公司 Website cheating method and system
CN114531261A (en) * 2020-11-09 2022-05-24 奇安信科技集团股份有限公司 Information processing method, device, system, medium, and program for coping with network attack
WO2022111268A1 (en) * 2020-11-25 2022-06-02 International Business Machines Corporation Defense of targeted database attacks through dynamic honeypot database response generation
CN117834273A (en) * 2024-01-04 2024-04-05 江苏君立华域信息安全技术股份有限公司 Honeypot system and method based on large language model

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111401067A (en) * 2020-03-18 2020-07-10 上海观安信息技术股份有限公司 Honeypot simulation data generation method and device
CN111401067B (en) * 2020-03-18 2023-07-14 上海观安信息技术股份有限公司 Honeypot simulation data generation method and device
CN111756742A (en) * 2020-06-24 2020-10-09 广州锦行网络科技有限公司 Honeypot deception defense system and deception defense method thereof
CN112367315A (en) * 2020-11-03 2021-02-12 浙江大学 Endogenous safe WAF honeypot deployment method
CN114531261A (en) * 2020-11-09 2022-05-24 奇安信科技集团股份有限公司 Information processing method, device, system, medium, and program for coping with network attack
WO2022111268A1 (en) * 2020-11-25 2022-06-02 International Business Machines Corporation Defense of targeted database attacks through dynamic honeypot database response generation
US11824894B2 (en) 2020-11-25 2023-11-21 International Business Machines Corporation Defense of targeted database attacks through dynamic honeypot database response generation
CN112560438A (en) * 2020-11-27 2021-03-26 同济大学 Text generation method based on generation of confrontation network
CN113472761A (en) * 2021-06-22 2021-10-01 杭州默安科技有限公司 Website cheating method and system
CN117834273A (en) * 2024-01-04 2024-04-05 江苏君立华域信息安全技术股份有限公司 Honeypot system and method based on large language model

Similar Documents

Publication Publication Date Title
CN110719253A (en) Web honeypot system based on intelligence question-answering
Khan et al. Detecting malicious URLs using binary classification through ada boost algorithm.
CN109873810B (en) Network fishing detection method based on goblet sea squirt group algorithm support vector machine
Wanjau et al. SSH-brute force attack detection model based on deep learning
Shafi et al. Evaluation of an adaptive genetic-based signature extraction system for network intrusion detection
Hidalgo et al. Captchas: An artificial intelligence application to web security
CN117610026B (en) Honey point vulnerability generation method based on large language model
Taofeek Development of a Novel Approach to Phishing Detection Using Machine Learning
US11755749B2 (en) System and method for reverse-Turing bot detection
Chelliah et al. Similarity-based optimised and adaptive adversarial attack on image classification using neural network
CN109194605A (en) A kind of suspected threat index Proactive authentication method and system based on open source information
Brites et al. Phishfry-a proactive approach to classify phishing sites using scikit learn
CN114169432B (en) Cross-site scripting attack recognition method based on deep learning
Kang et al. CAPTCHA phishing: A practical attack on human interaction proofing
Anagnostopoulos Weakly supervised learning: how to engineer labels for machine learning in cyber-security
Naik et al. Building a cognizant honeypot for detecting active fingerprinting attacks using dynamic fuzzy rule interpolation
Wang A survey of phishing detection: from an intelligent countermeasures view
CN117614748B (en) Phishing mail detection method based on large language model
Rusu et al. Leveraging Cognitive Factors in Securing {WWW} with {CAPTCHA}
Ragavi et al. CAPTCHA celebrating its quattuordecennial-a complete reference
Gougeon et al. A simple attack on captchastar
Dadkhah et al. An overview of phishing attacks and their detection techniques
Vinay Deep et al. Detection of Phishing Websites Using Machine Learning
Ma et al. Automatically generating classifier for phishing email prediction
Gowda et al. Detection of Phishing Websites Using Machine Learning.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200121