WO2020073494A1 - 网页后门检测方法、设备、存储介质及装置 - Google Patents

网页后门检测方法、设备、存储介质及装置 Download PDF

Info

Publication number
WO2020073494A1
WO2020073494A1 PCT/CN2018/122828 CN2018122828W WO2020073494A1 WO 2020073494 A1 WO2020073494 A1 WO 2020073494A1 CN 2018122828 W CN2018122828 W CN 2018122828W WO 2020073494 A1 WO2020073494 A1 WO 2020073494A1
Authority
WO
WIPO (PCT)
Prior art keywords
script
feature
detection
preset
webpage
Prior art date
Application number
PCT/CN2018/122828
Other languages
English (en)
French (fr)
Inventor
李坤
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020073494A1 publication Critical patent/WO2020073494A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures

Definitions

  • This application relates to the technical field of monitoring, in particular to a webpage backdoor detection method, equipment, storage medium and device.
  • the webpage backdoor uses Active Server Pages (Active Server Pages, ASP), Hypertext Preprocessor (PHP), java server page (Java Server Pages, JSP) or Common Gateway Interface (Common Gateway Interface, CGI) and other command execution environments in the form of web files.
  • ASP Active Server Pages
  • PHP Hypertext Preprocessor
  • JSP Java Server Pages
  • CGI Common Gateway Interface
  • Webpage backdoors usually contain more obvious static features.
  • webpage scripts are detected based on the static characteristics to detect whether the webpage script is a webpage backdoor, which often generates many false positives. Therefore, how to improve the detection of webpage backdoors The accuracy rate is a technical problem to be solved urgently.
  • the main purpose of the present application is to provide a webpage backdoor detection method, equipment, storage medium and device, aiming to solve the technical problem of high detection false alarm rate of webpage backdoor in the prior art.
  • the present application provides a webpage backdoor detection method, which includes the following steps:
  • the target script feature is detected through a preset detection model to obtain a target detection result.
  • the present application also provides a webpage backdoor detection device
  • the webpage backdoor detection device includes a memory, a processor, and a webpage backdoor detection device that is stored on the memory and can run on the processor
  • Read instructions, the readable instructions for webpage backdoor detection are configured to implement the steps of the webpage backdoor detection method as described above.
  • the present application also proposes a storage medium that stores a readable instruction for webpage backdoor detection, and when the readable instruction for webpage backdoor detection is executed by a processor, a webpage as described above is implemented Backdoor detection method steps.
  • the present application also provides a webpage backdoor detection device, the webpage backdoor detection device includes:
  • the matching module is used to obtain the network script to be detected, and match the network script to be detected with the backdoor rule of the preset webpage;
  • the extraction module is used to extract the feature of the network script to be detected through a preset extraction model to obtain the target script feature if the matching fails.
  • the detection module is configured to detect the target script feature through a preset detection model to obtain a target detection result.
  • the network script to be detected is obtained, the network script to be detected is matched with a preset webpage backdoor rule, and the network script to be detected is detected through rule-based matching, and the backdoor of the webpage with obvious characteristics can be detected Out; if the match fails, feature extraction of the network script to be detected through a preset extraction model to obtain target script features, detection of the target script features through a preset detection model to obtain target detection results, by applying rules The combination of detection and machine learning-based model detection. Webpage backdoors that fail to be detected by rule matching can be further detected by machine learning-based models.
  • the preset detection model undergoes extensive sample learning and evaluation of detection accuracy , Has a better detection effect, thereby improving the accuracy of the system to detect whether the network script is a webpage backdoor.
  • FIG. 1 is a schematic structural diagram of a webpage backdoor detection device of a hardware operating environment involved in an embodiment of the present application
  • FIG. 2 is a schematic flowchart of a first embodiment of a webpage backdoor detection method of the application
  • FIG. 3 is a schematic flowchart of a second embodiment of a webpage backdoor detection method of the application.
  • FIG. 4 is a schematic flowchart of a third embodiment of a webpage backdoor detection method of the application.
  • FIG. 5 is a structural block diagram of a first embodiment of a webpage backdoor detection device of the present application.
  • FIG. 1 is a schematic structural diagram of a webpage backdoor detection device of a hardware operating environment according to an embodiment of the present application.
  • the webpage backdoor detection device may include: a processor 1001, for example, a central processor (Central Processing Unit, CPU), communication bus 1002, user interface 1003, network interface 1004, memory 1005.
  • the communication bus 1002 is used to implement connection communication between these components.
  • the user interface 1003 may include a display (Display), and the optional user interface 1003 may further include a standard wired interface and a wireless interface.
  • the wired interface of the user interface 1003 may be a USB interface in this application.
  • the network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a wireless fidelity (WIreless-FIdelity, WI-FI) interface).
  • WIreless-FIdelity WI-FI
  • the memory 1005 may be a high-speed random access memory (Random Access Memory (RAM) memory can also be a stable memory (Non-volatile Memory, NVM), such as disk storage.
  • RAM Random Access Memory
  • NVM Non-volatile Memory
  • the memory 1005 may optionally be a storage device independent of the foregoing processor 1001.
  • FIG. 1 does not constitute a limitation on the webpage backdoor detection device, and may include more or less components than those illustrated, or combine certain components, or arrange different components.
  • the memory 1005 recognized as a computer storage medium may include an operating system, a network communication module, a user interface module, and a readable instruction for detecting a backdoor of a web page.
  • the network interface 1004 is mainly used to connect to a background server and perform data communication with the background server;
  • the user interface 1003 is mainly used to connect user equipment;
  • the webpage backdoor detection device passes through a processor 1001 calls the webpage backdoor detection readable instruction stored in the memory 1005, and executes the webpage backdoor detection method provided by the embodiment of the present application.
  • FIG. 2 is a schematic flowchart of a first embodiment of a webpage backdoor detection method of the present application, and proposes a first embodiment of a webpage backdoor detection method of the present application.
  • the webpage backdoor detection method includes the following steps:
  • Step S10 Obtain the network script to be detected, and match the network script to be detected with the preset backdoor rule of the web page.
  • the execution subject of this embodiment is the webpage backdoor detection device, where the webpage backdoor detection device may be an electronic device such as a personal computer or a server.
  • the preset webpage backdoor (webshell) rule may be a malicious string library, for example, including: "group-specific Malaysia
  • Extracting the features of the network script to be detected refers to the keywords, high-risk functions and file modifications used in the script to be detected Time, file permissions, file owner, and the relevance of other files to extract features in multiple dimensions, to obtain the script features, and match the obtained script features with the preset webshell rule base to obtain Matching result. If the matching result is a successful match, the web script to be detected is a webshell; if the matching result is a matching failure, the web script to be tested is not a webshell and may be a normal web script. Or a webshell that detects errors.
  • step S10 includes:
  • the gateway obtains the network script to be detected from an agent server (Agent).
  • Agent Agent server
  • the number of network scripts to be detected is usually multiple, or may be one.
  • the analysis of the network script to be detected usually involves splitting the network script to be detected into character strings, and extracting features of multiple preset dimensions from all character strings corresponding to the network script to be detected. Multiple preset dimensions include: keywords, high-risk functions, file modification time, file permissions, file owner, and association with other files. Normal web scripts will not contain the features in the preset webpage backdoor rules, then the features of the preset dimensions are matched with the preset webpage backdoor rules, so as to identify whether the web script to be detected is a webpage Backdoor, or normal network script.
  • Step S20 If the matching fails, feature extraction is performed on the network script to be detected through a preset extraction model to obtain target script features.
  • the matching fails, it indicates that the network script to be detected is not a webshell, which may be a normal network script, or a webshell with an error in detection.
  • feature extraction may be performed through the preset extraction model, and the preset extraction model includes a convolutional neural network model and the like.
  • a basic extraction model may be established in advance, a sample network script and corresponding features are acquired to train the basic extraction model, and the preset extraction model is obtained. Feature extraction is performed through the preset extraction model to obtain a suitable feature of the target script.
  • Step S30 Detect the target script feature through a preset detection model to obtain a target detection result.
  • the preset detection model includes a neural network model, which is trained by a large number of training samples to ensure the accuracy of detection of the target script feature by the preset detection model.
  • the target detection result may be that the target script feature is a feature corresponding to a webpage backdoor, that is, the network script to be detected corresponding to the target script feature is a webpage backdoor; the target detection result may also be the target script feature It is a feature corresponding to a normal network script, that is, the network script to be detected corresponding to the target script feature is a normal network script.
  • a basic prediction model is first established, and a large number of sample network scripts and corresponding sample detection results are obtained from the database.
  • the sample network scripts include a large number of normal web scripts And a large number of webpage backdoors, the sample network script can be subjected to data cleaning, and the sample network script after data cleaning can be subjected to feature extraction through the preset extraction model to obtain the first script feature corresponding to the sample network script, then
  • the basic prediction model may be trained according to a large number of the first script features and the corresponding sample detection results to obtain the preset detection model.
  • the data cleaning includes processing irrelevant data, repeated data and smoothed noise data in the sample network script, and processing missing values and outliers in the sample network script.
  • it further includes: establishing a basic prediction model; obtaining a sample network script and corresponding sample detection results; extracting the feature of the sample network script through the preset extraction model to obtain the first Script feature; training the basic prediction model according to the first script feature and the corresponding sample detection result to obtain a preset detection model.
  • the network script to be detected is obtained, the network script to be detected is matched with a preset webpage backdoor rule, and the network script to be detected is detected through rule-based matching.
  • a webpage backdoor with obvious characteristics Detected; if the match fails, feature extraction of the network script to be detected through a preset extraction model to obtain target script features, and detection of the target script features through a preset detection model to obtain target detection results, by applying Rule detection is combined with machine learning-based model detection.
  • Webpage backdoors that fail to be detected by rule matching can be further detected by a machine learning-based model.
  • the preset detection model undergoes a lot of sample learning and detection accuracy. Evaluation has a better detection effect, thereby improving the accuracy of the system to detect whether the web script is a backdoor of a web page.
  • FIG. 3 is a schematic flowchart of a second embodiment of the webpage backdoor detection method of the present application. Based on the first embodiment shown in FIG. 2 above, a second embodiment of the webpage backdoor detection method of the present application is proposed.
  • step S30 after the step S30, it further includes:
  • Step S40 If the target detection result is that the target script feature is a feature corresponding to the back door of the web page, then train the preset detection model according to the target script feature and the corresponding target detection result.
  • the target detection result is that the feature of the target script is a feature corresponding to the backdoor of the webpage
  • the web script to be detected corresponding to the feature of the target script is a backdoor of the webpage
  • the web script to be detected The target detection results are stored in a database, which can be used as sample data for online training or offline training, and the preset detection model can be trained according to the characteristics of the target script and the corresponding target detection results, to The training amount of the preset detection model is increased, thereby improving the accuracy of detection by the preset detection model.
  • step S20 includes:
  • Step S201 If the matching fails, perform data cleaning on the network script to be detected to obtain the target network script.
  • the network script to be detected needs to be sent to the preset high-throughput distributed publish-subscribe messaging system (Kafka), the preset high-throughput distributed publish-subscribe messaging system serves as a message queue, which can cache data, Data can also be distributed, and the network script to be detected is usually copied and distributed.
  • the preset high-throughput distributed publish-subscribe messaging system first copies a copy of the script to be detected into Hadoop.
  • Hadoop is a distributed system infrastructure developed by the Apache Foundation. Hadoop uses the network to be detected
  • the script is used for offline learning and backtracking events.
  • the preset high-throughput distributed publish-subscribe messaging system also copies a copy of the script to be detected for online learning, and sends it to the webpage backdoor detection device. Both online learning and offline learning need to pass the two processes of data cleaning and feature extraction of the network script to be detected, and then train the preset detection model.
  • the webpage backdoor detection device receives the network script to be detected sent by the preset high-throughput distributed publish-subscribe messaging system, and conducts online learning.
  • the network script to be detected needs to be cleaned first.
  • Data cleaning is mainly responsible for filtering out the data that does not conform to the rules, desensitizing sensitive data and formatting the data to facilitate feature extraction. For example, irrelevant data, duplicate data and smooth noise data in the network script to be detected are deleted, and missing values and abnormal values in the network script to be detected are processed.
  • a cleaning rule is constituted, and data that does not conform to the format in the script to be detected is filtered out by the cleaning rule.
  • a target network script is obtained, and then the target network script is subjected to feature extraction to obtain the characteristics of the target script.
  • Step S202 Perform feature extraction on the target network script through the preset extraction model to obtain target script features.
  • the matching fails, it indicates that the network script to be detected is not a webshell, which may be a normal network script, or a webshell with an error in detection.
  • feature extraction of the target network script may be performed through the preset extraction model. The target network script undergoes data cleaning to avoid processing excessive duplicate data and irrelevant data , To extract more suitable features of the target script to improve the efficiency and quality of feature extraction.
  • the matching fails, data cleaning is performed on the network script to be detected to obtain a target network script, and feature extraction of the target network script is performed through the preset extraction model to obtain target script characteristics and data. Clean and filter out duplicate data and irrelevant data in the network script to be detected, thereby improving the efficiency and quality of feature extraction.
  • FIG. 4 is a schematic flowchart of a third embodiment of the webpage backdoor detection method of the present application. Based on the second embodiment shown in FIG. 3, a third embodiment of the webpage backdoor detection method of the present application is proposed.
  • step S30 before the step S30, it further includes:
  • Step S203 Acquire a first number of sample webpage backdoors, and extract the feature webpage backdoors through the preset extraction model to obtain a second script feature.
  • the accuracy of the preset detection model or For the calculation of the recall rate when the accuracy rate exceeds a preset threshold (such as 80%), use the preset detection model for detection, or, when the recall rate is less than the preset recall threshold (such as 20%), Use the preset detection model for detection.
  • the accuracy rate or recall rate can be calculated by acquiring the first number of sample webpage backdoors from the database and detecting the first number of sample webpage backdoors according to the preset detection model. Since the preset detection model detects the feature corresponding to the network script, it is necessary to extract the feature of the sample webpage backdoor through the preset extraction model to obtain the second script feature.
  • Step S204 Detect the second script feature through the preset detection model to obtain an assessment test result, where the assessment test result includes a first test result that the second script feature is a feature corresponding to a backdoor of a webpage.
  • the second script feature is a feature corresponding to the backdoor of the sample webpage, and the second script feature is passed through the preset detection model to detect whether the second script feature is the backdoor of the webpage Feature, if all the second script features can be successfully detected by the preset detection model, the accuracy rate of the preset detection model is 100%.
  • the evaluation detection result includes a first detection result and a second detection result, the first detection result is that the second script feature is a feature corresponding to a back door of the webpage, and the second detection result is the second script feature Not a feature corresponding to a webpage backdoor.
  • Step S205 Count the second quantity of the first detection result, and calculate the accuracy of the preset detection model according to the first quantity and the second quantity.
  • the evaluation detection result may be analyzed, and the evaluation detection result is calculated as the first
  • the second script feature is the second quantity of the first detection result of the feature corresponding to the back door of the webpage
  • the second quantity is the number of the second script features that the preset detection model can correctly detect, that is, it can be correct
  • the number of corresponding sample webpage backdoors is detected, and the second number is divided by the first number to obtain the accuracy rate of the preset detection model.
  • the first quantity is subtracted from the second quantity to obtain a difference quantity, and the difference quantity is divided by the first quantity to obtain a recall rate of the preset detection model.
  • Step S206 When the accuracy rate exceeds a preset threshold, execute step S30.
  • the target script feature corresponding to the network script to be detected may be detected through the preset detection model to detect whether the target script feature is a feature corresponding to a backdoor of a webpage.
  • the evaluation detection result includes the second detection result that the second script feature is not a feature corresponding to the back door of the webpage; after step S205, it further includes:
  • the evaluation test result includes the second test result, where the second test result is that the second script feature is not a feature corresponding to the back door of the web page, and the second script corresponding to the second test result is acquired Feature, that is, the second script feature that is not successfully detected by the preset detection model, can be used as a false detection webpage backdoor feature, and the true detection result of the false detection webpage backdoor feature is set as the false detection webpage backdoor
  • the feature is the feature corresponding to the back door of the web page. Training the preset detection model according to the misdetected webpage backdoor feature and the corresponding real detection result, so that the preset detection model can identify the misdetected webpage backdoor feature as the webpage backdoor correspondence during subsequent detection To improve the detection accuracy of the preset detection model.
  • data washing can be performed on the backdoor of the sample webpage, and then the backdoor of the sample webpage after data cleaning can be used for feature extraction through the preset extraction model to obtain the second script feature.
  • Data cleaning includes deleting irrelevant data, duplicate data and smooth noise data in the backdoor of the sample webpage, and processing missing values and outliers in the backdoor of the sample webpage.
  • a first number of sample webpage backdoors are obtained, and the sample webpage backdoor is used to extract features through the preset extraction model to obtain a second script feature, and the second script is compared with the preset detection model Feature detection to obtain an evaluation detection result, the evaluation detection result including a first detection result where the second script feature is a feature corresponding to a backdoor of a web page, a second number of the first detection result is counted, and according to the first
  • the number and the second number calculate the accuracy of the preset detection model, and when the accuracy exceeds a preset threshold, execute the detection of the target script feature through the preset detection model to obtain a target detection result Step, when the accuracy rate exceeds the preset threshold, the accuracy rate of the preset detection model for detection is high and trustworthy, thereby ensuring that the preset detection model targets the script to be detected The accuracy of script feature detection.
  • the steps to implement the above embodiments may be completed by hardware, or may be completed by a program instructing related hardware.
  • the program may be stored in a computer-readable In the storage medium, the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk.
  • an embodiment of the present application further proposes a storage medium that stores a readable instruction for detecting a webpage backdoor detection, and when the readable instruction for detecting a webpage backdoor detection is executed by a processor, a webpage backdoor detection method as described above is implemented A step of.
  • an embodiment of the present application further provides a webpage backdoor detection device, and the webpage backdoor detection device includes:
  • the matching module 10 is used to obtain a network script to be detected, and match the network script to be detected with a preset backdoor rule of a webpage;
  • the extraction module 20 is configured to extract the feature of the network script to be detected through a preset extraction model to obtain the target script feature if the matching fails.
  • the detection module 30 is configured to detect the target script feature through a preset detection model to obtain a target detection result.
  • the execution subject of this embodiment is the webpage backdoor detection device, where the webpage backdoor detection device may be an electronic device such as a personal computer or a server.
  • the preset webpage backdoor (webshell) rule may be a malicious string library, for example, including: "group-specific Malaysia
  • Extracting the features of the network script to be detected refers to the keywords, high-risk functions and file modifications used in the script to be detected Time, file permissions, file owner, and the relevance of other files to extract features in multiple dimensions, to obtain the script features, and match the obtained script features with the preset webshell rule base to obtain Matching result. If the matching result is a successful match, the web script to be detected is a webshell; if the matching result is a matching failure, the web script to be tested is not a webshell and may be a normal web script. Or a webshell that detects errors.
  • the extraction module 20 is also used to obtain the network script to be detected through the gateway, analyze the network script to be detected, and extract features of multiple preset dimensions;
  • the matching module 10 is also used to match the feature of the preset dimension with the preset backdoor rule of the webpage.
  • the gateway obtains the network script to be detected from an agent server (Agent).
  • Agent Agent server
  • the number of network scripts to be detected is usually multiple, or may be one.
  • the analysis of the network script to be detected usually involves splitting the network script to be detected into character strings, and extracting features of multiple preset dimensions from all character strings corresponding to the network script to be detected. Multiple preset dimensions include: keywords, high-risk functions, file modification time, file permissions, file owner, and association with other files. Normal web scripts will not contain the features in the preset webpage backdoor rules, then the features of the preset dimensions are matched with the preset webpage backdoor rules, so as to identify whether the web script to be detected is a webpage Backdoor, or normal network script.
  • the matching fails, it indicates that the network script to be detected is not a webshell, which may be a normal network script, or a webshell with an error in detection.
  • feature extraction may be performed through the preset extraction model, and the preset extraction model includes a convolutional neural network model and the like.
  • a basic extraction model may be established in advance, a sample network script and corresponding features are acquired to train the basic extraction model, and the preset extraction model is obtained. Feature extraction is performed through the preset extraction model to obtain a suitable feature of the target script.
  • the preset detection model includes a neural network model, which is trained by a large number of training samples to ensure the accuracy of detection of the target script feature by the preset detection model.
  • the target detection result may be that the target script feature is a feature corresponding to a webpage backdoor, that is, the network script to be detected corresponding to the target script feature is a webpage backdoor; the target detection result may also be the target script feature It is a feature corresponding to a normal network script, that is, the network script to be detected corresponding to the target script feature is a normal network script.
  • a basic prediction model is first established, and a large number of sample network scripts and corresponding sample detection results are obtained from the database.
  • the sample network scripts include a large number of normal web scripts And a large number of webpage backdoors, the sample network script can be subjected to data cleaning, and the sample network script after data cleaning can be subjected to feature extraction through the preset extraction model to obtain the first script feature corresponding to the sample network script, then
  • the basic prediction model may be trained according to a large number of the first script features and the corresponding sample detection results to obtain the preset detection model.
  • the data cleaning includes processing irrelevant data, repeated data and smoothed noise data in the sample network script, and processing missing values and outliers in the sample network script.
  • it also includes: an establishment module for establishing a basic prediction model; an acquisition module for acquiring sample network scripts and corresponding sample detection results; and an extraction module 20 for passing the sample network scripts
  • the preset extraction model performs feature extraction to obtain a first script feature; a training module is used to train the basic prediction model according to the first script feature and the corresponding sample detection result to obtain a preset detection model .
  • the network script to be detected is obtained, the network script to be detected is matched with a preset webpage backdoor rule, and the network script to be detected is detected through rule-based matching.
  • a webpage backdoor with obvious characteristics Detected; if the match fails, feature extraction of the network script to be detected through a preset extraction model to obtain target script features, and detection of the target script features through a preset detection model to obtain target detection results, by applying Rule detection is combined with machine learning-based model detection.
  • Webpage backdoors that fail to be detected by rule matching can be further detected by a machine learning-based model.
  • the preset detection model undergoes a lot of sample learning and detection accuracy. Evaluation has a better detection effect, thereby improving the accuracy of the system to detect whether the web script is a backdoor of a web page.
  • the webpage backdoor detection device further includes: a training module, configured to: if the target detection result is that the target script feature is a feature corresponding to the webpage backdoor, according to the target script feature and the corresponding The target detection result trains the preset detection model.
  • the webpage backdoor detection device further includes: a data cleaning module, configured to perform data cleaning on the network script to be detected if a match fails, to obtain a target network script;
  • the extraction module 20 is further configured to perform feature extraction on the target network script through the preset extraction model to obtain target script features.
  • the extraction module 20 is further configured to obtain a network script to be detected through a gateway, analyze the network script to be detected, and extract features of multiple preset dimensions;
  • the matching module 10 is also used to match the feature of the preset dimension with the preset backdoor rule of the webpage.
  • the webpage backdoor detection device further includes: a building module for building a basic prediction model;
  • Acquisition module for acquiring sample network scripts and corresponding sample detection results
  • the extraction module 20 is further configured to perform feature extraction on the sample network script through the preset extraction model to obtain a first script feature
  • the training module is configured to train the basic prediction model according to the first script feature and the corresponding sample detection result to obtain a preset detection model.
  • the acquiring module is further configured to acquire a first number of sample webpage backdoors, extract the feature webpage backdoors through the preset extraction model, and obtain second script features;
  • the detection module 30 is further configured to detect the second script feature through the preset detection model to obtain an assessment test result, where the assessment test result includes that the second script feature is a feature corresponding to a backdoor of a webpage The first test result;
  • the webpage backdoor detection device further includes: a calculation module for counting a second quantity of the first detection result, and calculating an accuracy rate of the preset detection model according to the first quantity and the second quantity;
  • the detection module 30 is further configured to perform the step of detecting the feature of the target script through a preset detection model to obtain a target detection result when the accuracy rate exceeds a preset threshold.
  • the evaluation detection result includes a second detection result that the second script feature is not a feature corresponding to the back door of the web page;
  • the acquiring module is further configured to acquire a second script feature corresponding to the second detection result as a misdetected webpage backdoor feature
  • the training module is further configured to set a true detection result of the misdetected webpage backdoor feature as the misdetected webpage backdoor feature is a feature corresponding to the webpage backdoor, and according to the misdetected webpage backdoor feature
  • the preset detection model is used for training.
  • sequence numbers of the above embodiments of the present application are for description only, and do not represent the advantages and disadvantages of the embodiments.
  • several of these devices may be embodied by the same hardware item.
  • the use of the words first, second, and third does not indicate any order, and these words can be interpreted as names.
  • the embodiment method can be implemented by means of software plus the necessary general hardware platform, of course Hardware, but in many cases the former is a better implementation.
  • the technical solution of the present application may be essentially in the form of software products or contribute to the existing technology.
  • the computer software product is stored in a storage medium (such as Read Only Memory image (ROM) / Random Access Memory (Random Access Memory, RAM), magnetic disks, and optical disks) include several instructions to enable a terminal device (which may be a mobile phone, computer, server, air conditioner, or network device, etc.) to perform the methods described in the embodiments of the present application.
  • ROM Read Only Memory image
  • RAM Random Access Memory
  • magnetic disks magnetic disks
  • optical disks include several instructions to enable a terminal device (which may be a mobile phone, computer, server, air conditioner, or network device, etc.) to perform the methods described in the embodiments of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

一种网页后门检测方法、设备、存储介质及装置,该方法包括:获取待检测网络脚本,将所述待检测网络脚本与预设网页后门规则进行匹配(S10);若匹配失败,则通过预设提取模型对所述待检测网络脚本进行特征提取,获得目标脚本特征(S20);通过预设检测模型对所述目标脚本特征进行检测,获得目标检测结果(S30)。该方法通过将规则检测和基于机器学习的模型检测相结合,提高***检测所述待检测网络脚本是否为网页后门的准确性。

Description

网页后门检测方法、设备、存储介质及装置
本申请要求于2018年10月11日提交中国专利局、申请号为201811188296.9、发明名称为“网页后门检测方法、设备、存储介质及装置”的中国专利申请的优先权,其全部内容通过引用结合在申请中。
技术领域
本申请涉及监控技术领域,尤其涉及一种网页后门检测方法、设备、存储介质及装置。
背景技术
目前,网页后门(webshell)就是以动态服务器页面(Active Server Pages,ASP)、超文本预处理器(Hypertext Preprocessor,PHP)、java服务器页面(Java Server Pages,JSP)或者通用网关接口(Common Gateway Interface,CGI)等网页文件形式存在的一种命令执行环境。黑客在入侵了一个网站后,通常会将ASP或PHP后门文件与网站服务器WEB目录下正常的网页文件混在一起,然后就可以使用浏览器来访问ASP或PHP后门,得到一个命令执行环境,以达到控制网站服务器的目的。
网页后门通常包含较为明显的静态特征,目前,根据所述静态特征对网页脚本进行检测,以检测所述网页脚本是否为网页后门,往往会产生很多的误报,因此,如何提高对网页后门检测的准确率是亟待解决的技术问题。
上述内容仅用于辅助理解本申请的技术方案,并不代表承认上述内容是现有技术。
发明内容
本申请的主要目的在于提供一种网页后门检测方法、设备、存储介质及装置,旨在解决现有技术中网页后门的检测误报率高的技术问题。
为实现上述目的,本申请提供一种网页后门检测方法,所述网页后门检测方法包括以下步骤:
获取待检测网络脚本,将所述待检测网络脚本与预设网页后门规则进行匹配;
若匹配失败,则通过预设提取模型对所述待检测网络脚本进行特征提取,获得目标脚本特征;
通过预设检测模型对所述目标脚本特征进行检测,获得目标检测结果。
此外,为实现上述目的,本申请还提出一种网页后门检测设备,所述网页后门检测设备包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的网页后门检测可读指令,所述网页后门检测可读指令配置为实现如上文所述的网页后门检测方法的步骤。
此外,为实现上述目的,本申请还提出一种存储介质,所述存储介质上存储有网页后门检测可读指令,所述网页后门检测可读指令被处理器执行时实现如上文所述的网页后门检测方法的步骤。
此外,为实现上述目的,本申请还提出一种网页后门检测装置,所述网页后门检测装置包括:
匹配模块,用于获取待检测网络脚本,将所述待检测网络脚本与预设网页后门规则进行匹配;
提取模块,用于若匹配失败,则通过预设提取模型对所述待检测网络脚本进行特征提取,获得目标脚本特征;
检测模块,用于通过预设检测模型对所述目标脚本特征进行检测,获得目标检测结果。
本申请中,获取待检测网络脚本,将所述待检测网络脚本与预设网页后门规则进行匹配,通过基于规则的匹配对所述待检测网络脚本进行检测,对于特征明显的网页后门能够被检测出来;若匹配失败,则通过预设提取模型对所述待检测网络脚本进行特征提取,获得目标脚本特征,通过预设检测模型对所述目标脚本特征进行检测,获得目标检测结果,通过将规则检测和基于机器学习的模型检测相结合,未能通过规则匹配检测出的网页后门,可通过基于机器学习的模型进一步进行检测,所述预设检测模型经过大量的样本学习和检测准确率的评估,具有较好的检测效果,从而提高***检测网络脚本是否为网页后门的准确性。
附图说明
图1是本申请实施例方案涉及的硬件运行环境的网页后门检测设备的结构示意图;
图2为本申请网页后门检测方法第一实施例的流程示意图;
图3为本申请网页后门检测方法第二实施例的流程示意图;
图4为本申请网页后门检测方法第三实施例的流程示意图;
图5为本申请网页后门检测装置第一实施例的结构框图。
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
具体实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
参照图1,图1为本申请实施例方案涉及的硬件运行环境的网页后门检测设备结构示意图。
如图1所示,该网页后门检测设备可以包括:处理器1001,例如中央处理器(Central Processing Unit,CPU),通信总线1002、用户接口1003,网络接口1004,存储器1005。其中,通信总线1002用于实现这些组件之间的连接通信。用户接口1003可以包括显示屏(Display),可选用户接口1003还可以包括标准的有线接口、无线接口,对于用户接口1003的有线接口在本申请中可为USB接口。网络接口1004可选的可以包括标准的有线接口、无线接口(如无线保真(WIreless-FIdelity,WI-FI)接口)。存储器1005可以是高速的随机存取存储器(Random Access Memory,RAM)存储器,也可以是稳定的存储器(Non-volatile Memory,NVM),例如磁盘存储器。存储器1005可选的还可以是独立于前述处理器1001的存储装置。
本领域技术人员可以理解,图1中示出的结构并不构成对网页后门检测设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
如图1所示,认定为一种计算机存储介质的存储器1005中可以包括操作***、网络通信模块、用户接口模块以及网页后门检测可读指令。
在图1所示的网页后门检测设备中,网络接口1004主要用于连接后台服务器,与所述后台服务器进行数据通信;用户接口1003主要用于连接用户设备;所述网页后门检测设备通过处理器1001调用存储器1005中存储的网页后门检测可读指令,并执行本申请实施例提供的网页后门检测方法。
基于上述硬件结构,提出本申请网页后门检测方法的实施例。
参照图2,图2为本申请网页后门检测方法第一实施例的流程示意图,提出本申请网页后门检测方法第一实施例。
在第一实施例中,所述网页后门检测方法包括以下步骤:
步骤S10:获取待检测网络脚本,将所述待检测网络脚本与预设网页后门规则进行匹配。
应理解的是,本实施例的执行主体是所述网页后门检测设备,其中,所述网页后门检测设备可为个人电脑或服务器等电子设备。所述预设网页后门(webshell)规则可以是恶意字符串库,例如,包括:“组专用大马|提权|木马|PHP\s?反弹提权cmd执行”和“WScript.Shell、Shell.Application、Eval()、Excute()、Set Server、Run()、Exec()以及ShellExcute()”等恶意字符串。将所述待检测网络脚本进行特征提取,是指对所述待检测脚本中所使用的关键词、高危函数、文件修改的时间、文件权限、文件的所有者以及和其它文件的关联性等多个维度的特征进行提取,从而获得所述脚本特征,将获得的脚本特征与所述预设webshell规则库进行匹配,获得匹配结果。若所述匹配结果为匹配成功,则说明所述待检测网络脚本为webshell;若所述匹配结果为匹配失败,则说明所述待检测网络脚本不是webshell,可能为正常的网络脚本,或者是检测失误的webshell。
在本实施例中,所述步骤S10,包括:
通过网关获取待检测网络脚本,对所述待检测网络脚本进行分析,提取出多个预设维度的特征;
将所述预设维度的特征与预设网页后门规则进行匹配。
需要说明的是,所述网关(Gateway)从代理服务器(Agent)获取所述待检测网络脚本,所述待检测网络脚本通常为多个,也可以是一个。对所述待检测网络脚本进行分析,通常是将所述待检测网络脚本拆分成字符串,从所述待检测网络脚本对应的所有字符串中提取出多个预设维度的特征,所述多个预设维度包括:关键词、高危函数、文件修改的时间、文件权限、文件的所有者以及和其它文件的关联性等。正常的网络脚本不会包含所述预设网页后门规则中的特征,则将所述预设维度的特征与所述预设网页后门规则进行匹配,从而识别出所述待检测网络脚本是否为网页后门,或者是正常的网络脚本。
步骤S20:若匹配失败,则通过预设提取模型对所述待检测网络脚本进行特征提取,获得目标脚本特征。
可理解的是,若匹配失败,说明所述待检测网络脚本不是webshell,可能为正常的网络脚本,或者是检测失误的webshell。为了进一步识别所述待检测脚本是否为webshell,可通过所述预设提取模型进行特征提取,所述预设提取模型包括卷积神经网络模型等。可预先建立基础提取模型,获取样本网络脚本与对应的特征对所述基础提取模型进行训练,获得所述预设提取模型。通过所述预设提取模型进行特征提取,获得合适的所述目标脚本特征。
步骤S30:通过预设检测模型对所述目标脚本特征进行检测,获得目标检测结果。
在具体实现中,所述预设检测模型包括神经网络模型,经过丰富的大量的训练样本的训练,保证所述预设检测模型对所述目标脚本特征检测的准确性。所述目标检测结果可以是所述目标脚本特征为网页后门对应的特征,即所述目标脚本特征对应的所述待检测网络脚本为网页后门;所述目标检测结果还可以是所述目标脚本特征为正常网络脚本对应的特征,即所述目标脚本特征对应的所述待检测网络脚本为正常的网络脚本。
应理解的是,对于所述预设检测模型的建立过程,首先建立基础预测模型,从数据库中获取大量的样本网络脚本及对应的样本检测结果,所述样本网络脚本包括大量的正常的网页脚本和大量的网页后门,可将所述样本网络脚本进行数据清洗,将数据清洗后的样本网络脚本通过所述预设提取模型进行特征提取,获得所述样本网络脚本对应的第一脚本特征,则可根据大量的所述第一脚本特征及对应的所述样本检测结果对所述基础预测模型进行训练,获得所述预设检测模型。所述数据清洗包括对所述样本网络脚本中的无关数据、重复数据和平滑噪声数据,处理所述样本网络脚本中缺失值和异常值。本实施例中,所述步骤S30之前,还包括:建立基础预测模型;获取样本网络脚本及对应的样本检测结果;将所述样本网络脚本通过所述预设提取模型进行特征提取,获得第一脚本特征;根据所述第一脚本特征及对应的所述样本检测结果对所述基础预测模型进行训练,获得预设检测模型。
本实施例中,获取待检测网络脚本,将所述待检测网络脚本与预设网页后门规则进行匹配,通过基于规则的匹配对所述待检测网络脚本进行检测,对于特征明显的网页后门能够被检测出来;若匹配失败,则通过预设提取模型对所述待检测网络脚本进行特征提取,获得目标脚本特征,通过预设检测模型对所述目标脚本特征进行检测,获得目标检测结果,通过将规则检测和基于机器学习的模型检测相结合,未能通过规则匹配检测出的网页后门,可通过基于机器学习的模型进一步进行检测,所述预设检测模型经过大量的样本学习和检测准确率的评估,具有较好的检测效果,从而提高***检测网络脚本是否为网页后门的准确性。
参照图3,图3为本申请网页后门检测方法第二实施例的流程示意图,基于上述图2所示的第一实施例,提出本申请网页后门检测方法的第二实施例。
在第二实施例中,所述步骤S30之后,还包括:
步骤S40:若所述目标检测结果为所述目标脚本特征是网页后门对应的特征,则根据所述目标脚本特征与对应的所述目标检测结果对所述预设检测模型进行训练。
可理解的是,若所述目标检测结果为所述目标脚本特征是网页后门对应的特征,说明所述目标脚本特征对应的待检测网络脚本是网页后门,可将所述待检测网络脚本与对应的所述目标检测结果存入数据库中,后续可作为在线训练或离线训练的样本数据,还可根据所述目标脚本特征与对应的所述目标检测结果对所述预设检测模型进行训练,以提高所述预设检测模型的训练量,从而提高所述预设检测模型进行检测的准确率。
在本实施例中,所述步骤S20,包括:
步骤S201:若匹配失败,则对所述待检测网络脚本进行数据清洗,获得目标网络脚本。
应理解的是,若将所述待检测脚本与预设网页后门规则进行匹配的结果为匹配失败,则说明所述待检测网络脚本不是webshell。需要将所述待检测网络脚本发送至所述预设高吞吐量的分布式发布订阅消息***(Kafka),所述预设高吞吐量的分布式发布订阅消息***作为消息队列,可以缓存数据,也可以对数据分流,通常将所述待检测网络脚本进行复制分发。所述预设高吞吐量的分布式发布订阅消息***首先将所述待检测脚本复制一份存入Hadoop,Hadoop是一个由Apache基金会所开发的分布式***基础架构,Hadoop将所述待检测网络脚本用于离线学习以及回溯事件的时候使用。所述预设高吞吐量的分布式发布订阅消息***还复制一份所述待检测脚本进行在线学习,发送至所述网页后门检测设备。在线学习和离线学习都需要将所述待检测网络脚本经过数据清洗和特征提取两个流程,再对所述预设检测模型进行训练。
需要说明的是,所述网页后门检测设备接收所述预设高吞吐量的分布式发布订阅消息***发送的待检测网络脚本,进行在线学***滑噪声数据,处理所述待检测网络脚本中缺失值和异常值。可通过设置哪些字段是允许的,哪些字段是不允许的,构成清洗规则,通过所述清洗规则来过滤掉所述待检测脚本中不符合格式的数据。对所述待检测网络脚本进行数据清洗之后,获得目标网络脚本,将所述目标网络脚本再进行特征提取,获得所述目标脚本特征。
步骤S202:通过所述预设提取模型对所述目标网络脚本进行特征提取,获得目标脚本特征。
在具体实现中,若匹配失败,说明所述待检测网络脚本不是webshell,可能为正常的网络脚本,或者是检测失误的webshell。为了进一步识别所述待检测脚本是否为webshell,可通过所述预设提取模型对所述目标网络脚本进行特征提取,所述目标网络脚本经过数据清洗,避免处理过多重复数据和不相关的数据,提取出更合适的所述目标脚本特征,提高特征提取的效率和质量。
本实施例中,若匹配失败,则对所述待检测网络脚本进行数据清洗,获得目标网络脚本,通过所述预设提取模型对所述目标网络脚本进行特征提取,获得目标脚本特征,经过数据清洗,过滤掉所述待检测网络脚本中的重复数据和无关数据,从而提高特征提取的效率和质量。
参照图4,图4为本申请网页后门检测方法第三实施例的流程示意图,基于上述图3所示的第二实施例,提出本申请网页后门检测方法的第三实施例。
在第三实施例中,所述步骤S30之前,还包括:
步骤S203:获取第一数量的样本网页后门,将所述样本网页后门通过所述预设提取模型进行特征提取,获得第二脚本特征。
应理解的是,为了确保所述待检测网络脚本通过所述预设检测模型进行检测的准确率,在通过所述预设检测模型进行检测之前,需要对所述预设检测模型进行准确率或召回率的计算,在所述准确率超过预设阈值(比如80%)时,使用所述预设检测模型进行检测,或者,在所述召回率小于预设召回阈值(比如20%)时,使用所述预设检测模型进行检测。可通过从数据库中获取所述第一数量的样本网页后门,根据所述预设检测模型对所述第一数量的样本网页后门的检测结果来计算其准确率或召回率。由于所述预设检测模型进行检测的是网络脚本对应的特征,则需将所述样本网页后门通过所述预设提取模型进行特征提取,获得所述第二脚本特征。
步骤S204:通过所述预设检测模型对所述第二脚本特征进行检测,获得评估检测结果,所述评估检测结果包括所述第二脚本特征是网页后门对应的特征的第一检测结果。
需要说明的是,所述第二脚本特征为所述样本网页后门对应的特征,将所述第二脚本特征通过所述预设检测模型,从而检测出所述第二脚本特征是否为网页后门的特征,若所有的所述第二脚本特征都能够被所述预设检测模型成功检测出来,说明所述预设检测模型的准确率为100%。所述评估检测结果中包括第一检测结果和第二检测结果,所述第一检测结果为所述第二脚本特征是网页后门对应的特征,所述第二检测结果为所述第二脚本特征不是网页后门对应的特征。
步骤S205:统计所述第一检测结果的第二数量,根据所述第一数量和所述第二数量计算所述预设检测模型的准确率。
在具体实现中,通常所述预设检测模型不能完全将所有的第一数量的样本网页后门成功检测出来,则可对所述评估检测结果进行分析,统计出所述评估检测结果为所述第二脚本特征是网页后门对应的特征的所述第一检测结果的第二数量,所述第二数量即所述预设检测模型能够正确检测出的所述第二脚本特征的数量,即能够正确检测出对应的样本网页后门的数量,将所述第二数量除以所述第一数量,即获得所述预设检测模型的准确率。将所述第一数量减去所述第二数量,获得差值数量,将所述差值数量除以所述第一数量,即获得所述预设检测模型的召回率。
步骤S206:在所述准确率超过预设阈值时,执行所述步骤S30。
可理解的是,在所述准确率超过所述预设阈值时,或者,所述召回率小于预设召回阈值时,说明所述预设检测模型进行检测的准确率较高,值得信赖,则可通过所述预设检测模型对所述待检测网络脚本对应的目标脚本特征进行检测,以检测出所述目标脚本特征是否为网页后门对应的特征。
在本实施例中,所述评估检测结果包括所述第二脚本特征不是网页后门对应的特征的第二检测结果;所述步骤S205之后,还包括:
获取所述第二检测结果对应的第二脚本特征作为误测网页后门特征;
设置所述误测网页后门特征的真实检测结果为所述误测网页后门特征是网页后门对应的特征,根据所述误测网页后门特征与对应的真实检测结果对所述预设检测模型进行训练。
应理解的是,所述评估检测结果包括所述第二检测结果,所述第二检测结果为所述第二脚本特征不是网页后门对应的特征,获取所述第二检测结果对应的第二脚本特征,即所述预设检测模型未能成功检测出的第二脚本特征,可将其作为误测网页后门特征,并设置所述误测网页后门特征的真实检测结果为所述误测网页后门特征是网页后门对应的特征。根据所述误测网页后门特征与对应的真实检测结果对所述预设检测模型进行训练,以使所述预设检测模型在后续检测时能够识别出所述误测网页后门特征为网页后门对应的特征,以提高所述预设检测模型进行检测的准确率。再对所述样本网页后门进行特征提取之前,可先对所述样本网页后门进行数据清洗,再将数据清洗后的样本网页后门通过所述预设提取模型进行特征提取,获得第二脚本特征。数据清洗包括删除所述样本网页后门中的无关数据、重复数据和平滑噪声数据,处理所述样本网页后门中缺失值和异常值。可通过设置哪些字段是允许的,哪些字段是不允许的,构成清洗规则,通过所述清洗规则来过滤掉所述样本网页后门中不符合格式的数据。
本实施例中,获取第一数量的样本网页后门,将所述样本网页后门通过所述预设提取模型进行特征提取,获得第二脚本特征,通过所述预设检测模型对所述第二脚本特征进行检测,获得评估检测结果,所述评估检测结果包括所述第二脚本特征是网页后门对应的特征的第一检测结果,统计所述第一检测结果的第二数量,根据所述第一数量和所述第二数量计算所述预设检测模型的准确率,在所述准确率超过预设阈值时,执行所述通过预设检测模型对所述目标脚本特征进行检测,获得目标检测结果的步骤,在所述准确率超过所述预设阈值时,说明所述预设检测模型进行检测的准确率较高,值得信赖,从而确保所述预设检测模型对所述待检测脚本的目标脚本特征进行检测的准确性。
需要说明的是,本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
此外,本申请实施例还提出一种存储介质,所述存储介质上存储有网页后门检测可读指令,所述网页后门检测可读指令被处理器执行时实现如上文所述的网页后门检测方法的步骤。
此外,参照图5,本申请实施例还提出一种网页后门检测装置,所述网页后门检测装置包括:
匹配模块10,用于获取待检测网络脚本,将所述待检测网络脚本与预设网页后门规则进行匹配;
提取模块20,用于若匹配失败,则通过预设提取模型对所述待检测网络脚本进行特征提取,获得目标脚本特征;
检测模块30,用于通过预设检测模型对所述目标脚本特征进行检测,获得目标检测结果。
应理解的是,本实施例的执行主体是所述网页后门检测设备,其中,所述网页后门检测设备可为个人电脑或服务器等电子设备。所述预设网页后门(webshell)规则可以是恶意字符串库,例如,包括:“组专用大马|提权|木马|PHP\s?反弹提权cmd执行”和“WScript.Shell、Shell.Application、Eval()、Excute()、Set Server、Run()、Exec()、ShellExcute()”等恶意字符串。将所述待检测网络脚本进行特征提取,是指对所述待检测脚本中所使用的关键词、高危函数、文件修改的时间、文件权限、文件的所有者以及和其它文件的关联性等多个维度的特征进行提取,从而获得所述脚本特征,将获得的脚本特征与所述预设webshell规则库进行匹配,获得匹配结果。若所述匹配结果为匹配成功,则说明所述待检测网络脚本为webshell;若所述匹配结果为匹配失败,则说明所述待检测网络脚本不是webshell,可能为正常的网络脚本,或者是检测失误的webshell。
在本实施例中,所述提取模块20,还用于通过网关获取待检测网络脚本,对所述待检测网络脚本进行分析,提取出多个预设维度的特征;
所述匹配模块10,还用于将所述预设维度的特征与预设网页后门规则进行匹配。
需要说明的是,所述网关(Gateway)从代理服务器(Agent)获取所述待检测网络脚本,所述待检测网络脚本通常为多个,也可以是一个。对所述待检测网络脚本进行分析,通常是将所述待检测网络脚本拆分成字符串,从所述待检测网络脚本对应的所有字符串中提取出多个预设维度的特征,所述多个预设维度包括:关键词、高危函数、文件修改的时间、文件权限、文件的所有者以及和其它文件的关联性等。正常的网络脚本不会包含所述预设网页后门规则中的特征,则将所述预设维度的特征与所述预设网页后门规则进行匹配,从而识别出所述待检测网络脚本是否为网页后门,或者是正常的网络脚本。
可理解的是,若匹配失败,说明所述待检测网络脚本不是webshell,可能为正常的网络脚本,或者是检测失误的webshell。为了进一步识别所述待检测脚本是否为webshell,可通过所述预设提取模型进行特征提取,所述预设提取模型包括卷积神经网络模型等。可预先建立基础提取模型,获取样本网络脚本与对应的特征对所述基础提取模型进行训练,获得所述预设提取模型。通过所述预设提取模型进行特征提取,获得合适的所述目标脚本特征。
在具体实现中,所述预设检测模型包括神经网络模型,经过丰富的大量的训练样本的训练,保证所述预设检测模型对所述目标脚本特征检测的准确性。所述目标检测结果可以是所述目标脚本特征为网页后门对应的特征,即所述目标脚本特征对应的所述待检测网络脚本为网页后门;所述目标检测结果还可以是所述目标脚本特征为正常网络脚本对应的特征,即所述目标脚本特征对应的所述待检测网络脚本为正常的网络脚本。
应理解的是,对于所述预设检测模型的建立过程,首先建立基础预测模型,从数据库中获取大量的样本网络脚本及对应的样本检测结果,所述样本网络脚本包括大量的正常的网页脚本和大量的网页后门,可将所述样本网络脚本进行数据清洗,将数据清洗后的样本网络脚本通过所述预设提取模型进行特征提取,获得所述样本网络脚本对应的第一脚本特征,则可根据大量的所述第一脚本特征及对应的所述样本检测结果对所述基础预测模型进行训练,获得所述预设检测模型。所述数据清洗包括对所述样本网络脚本中的无关数据、重复数据和平滑噪声数据,处理所述样本网络脚本中缺失值和异常值。本实施例中,还包括:建立模块,用于建立基础预测模型;获取模块,用于获取样本网络脚本及对应的样本检测结果;所述提取模块20,还用于将所述样本网络脚本通过所述预设提取模型进行特征提取,获得第一脚本特征;训练模块,用于根据所述第一脚本特征及对应的所述样本检测结果对所述基础预测模型进行训练,获得预设检测模型。
本实施例中,获取待检测网络脚本,将所述待检测网络脚本与预设网页后门规则进行匹配,通过基于规则的匹配对所述待检测网络脚本进行检测,对于特征明显的网页后门能够被检测出来;若匹配失败,则通过预设提取模型对所述待检测网络脚本进行特征提取,获得目标脚本特征,通过预设检测模型对所述目标脚本特征进行检测,获得目标检测结果,通过将规则检测和基于机器学习的模型检测相结合,未能通过规则匹配检测出的网页后门,可通过基于机器学习的模型进一步进行检测,所述预设检测模型经过大量的样本学习和检测准确率的评估,具有较好的检测效果,从而提高***检测网络脚本是否为网页后门的准确性。
在一实施例中,所述网页后门检测装置还包括:训练模块,用于若所述目标检测结果为所述目标脚本特征是网页后门对应的特征,则根据所述目标脚本特征与对应的所述目标检测结果对所述预设检测模型进行训练。
在一实施例中,所述网页后门检测装置还包括:数据清洗模块,用于若匹配失败,则对所述待检测网络脚本进行数据清洗,获得目标网络脚本;
所述提取模块20,还用于通过所述预设提取模型对所述目标网络脚本进行特征提取,获得目标脚本特征。
在一实施例中,所述提取模块20,还用于通过网关获取待检测网络脚本,对所述待检测网络脚本进行分析,提取出多个预设维度的特征;
所述匹配模块10,还用于将所述预设维度的特征与预设网页后门规则进行匹配。
在一实施例中,所述网页后门检测装置还包括:建立模块,用于建立基础预测模型;
获取模块,用于获取样本网络脚本及对应的样本检测结果;
所述提取模块20,还用于将所述样本网络脚本通过所述预设提取模型进行特征提取,获得第一脚本特征;
训练模块,用于根据所述第一脚本特征及对应的所述样本检测结果对所述基础预测模型进行训练,获得预设检测模型。
在一实施例中,所述获取模块,还用于获取第一数量的样本网页后门,将所述样本网页后门通过所述预设提取模型进行特征提取,获得第二脚本特征;
所述检测模块30,还用于通过所述预设检测模型对所述第二脚本特征进行检测,获得评估检测结果,所述评估检测结果包括所述第二脚本特征是网页后门对应的特征的第一检测结果;
所述网页后门检测装置还包括:计算模块,用于统计所述第一检测结果的第二数量,根据所述第一数量和所述第二数量计算所述预设检测模型的准确率;
所述检测模块30,还用于在所述准确率超过预设阈值时,执行所述通过预设检测模型对所述目标脚本特征进行检测,获得目标检测结果的步骤。
在一实施例中,所述评估检测结果包括所述第二脚本特征不是网页后门对应的特征的第二检测结果;
所述获取模块,还用于获取所述第二检测结果对应的第二脚本特征作为误测网页后门特征;
所述训练模块,还用于设置所述误测网页后门特征的真实检测结果为所述误测网页后门特征是网页后门对应的特征,根据所述误测网页后门特征与对应的真实检测结果对所述预设检测模型进行训练。
本申请所述网页后门检测装置的其他实施例或具体实现方式可参照上述各方法实施例,此处不再赘述。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者***不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者***所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者***中还存在另外的相同要素。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。词语第一、第二、以及第三等的使用不表示任何顺序,可将这些词语解释为名称。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述 实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通 过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体 现出来,该计算机软件产品存储在一个存储介质(如只读存储器镜像(Read Only Memory image,ROM)/随机存取存储器(Random Access Memory,RAM)、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种网页后门检测方法,其特征在于,所述网页后门检测方法包括以下步骤:
    获取待检测网络脚本,将所述待检测网络脚本与预设网页后门规则进行匹配;
    若匹配失败,则通过预设提取模型对所述待检测网络脚本进行特征提取,获得目标脚本特征;
    通过预设检测模型对所述目标脚本特征进行检测,获得目标检测结果。
  2. 如权利要求1所述的网页后门检测方法,其特征在于,所述通过预设检测模型对所述目标脚本特征进行检测,获得目标检测结果之后,所述网页后门检测方法还包括:
    若所述目标检测结果为所述目标脚本特征是网页后门对应的特征,则根据所述目标脚本特征与对应的所述目标检测结果对所述预设检测模型进行训练。
  3. 如权利要求2所述的网页后门检测方法,其特征在于,所述若匹配失败,则通过预设提取模型对所述待检测网络脚本进行特征提取,获得目标脚本特征,包括:
    若匹配失败,则对所述待检测网络脚本进行数据清洗,获得目标网络脚本;
    通过所述预设提取模型对所述目标网络脚本进行特征提取,获得目标脚本特征。
  4. 如权利要求1所述的网页后门检测方法,其特征在于,所述获取待检测网络脚本,将所述待检测网络脚本与预设网页后门规则进行匹配,包括:
    通过网关获取待检测网络脚本,对所述待检测网络脚本进行分析,提取出多个预设维度的特征;
    将所述预设维度的特征与预设网页后门规则进行匹配。
  5. 如权利要求1所述的网页后门检测方法,其特征在于,所述通过预设检测模型对所述目标脚本特征进行检测,获得目标检测结果之前,所述网页后门检测方法还包括:
    建立基础预测模型;
    获取样本网络脚本及对应的样本检测结果;
    将所述样本网络脚本通过所述预设提取模型进行特征提取,获得第一脚本特征;
    根据所述第一脚本特征及对应的所述样本检测结果对所述基础预测模型进行训练,获得预设检测模型。
  6. 如权利要求1所述的网页后门检测方法,其特征在于,所述通过预设检测模型对所述目标脚本特征进行检测,获得目标检测结果之前,所述网页后门检测方法还包括:
    获取第一数量的样本网页后门,将所述样本网页后门通过所述预设提取模型进行特征提取,获得第二脚本特征;
    通过所述预设检测模型对所述第二脚本特征进行检测,获得评估检测结果,所述评估检测结果包括所述第二脚本特征是网页后门对应的特征的第一检测结果;
    统计所述第一检测结果的第二数量,根据所述第一数量和所述第二数量计算所述预设检测模型的准确率;
    在所述准确率超过预设阈值时,执行所述通过预设检测模型对所述目标脚本特征进行检测,获得目标检测结果的步骤。
  7. 如权利要求6所述的网页后门检测方法,其特征在于,所述评估检测结果包括所述第二脚本特征不是网页后门对应的特征的第二检测结果;
    所述统计所述第一检测结果的第二数量,根据所述第一数量和所述第二数量计算所述预设检测模型的准确率之后,所述网页后门检测方法还包括:
    获取所述第二检测结果对应的第二脚本特征作为误测网页后门特征;
    设置所述误测网页后门特征的真实检测结果为所述误测网页后门特征是网页后门对应的特征,根据所述误测网页后门特征与对应的真实检测结果对所述预设检测模型进行训练。
  8. 一种网页后门检测设备,其特征在于,所述网页后门检测设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的网页后门检测可读指令,所述网页后门检测可读指令被所述处理器执行时实现以下步骤:
    获取待检测网络脚本,将所述待检测网络脚本与预设网页后门规则进行匹配;
    若匹配失败,则通过预设提取模型对所述待检测网络脚本进行特征提取,获得目标脚本特征;
    通过预设检测模型对所述目标脚本特征进行检测,获得目标检测结果。
  9. 如权利要求8所述的网页后门检测设备,其特征在于,所述网页后门检测可读指令被所述处理器执行时还实现以下步骤:
    若所述目标检测结果为所述目标脚本特征是网页后门对应的特征,则根据所述目标脚本特征与对应的所述目标检测结果对所述预设检测模型进行训练。
  10. 如权利要求9所述的网页后门检测设备,其特征在于,所述网页后门检测可读指令被所述处理器执行时还实现以下步骤:
    若匹配失败,则对所述待检测网络脚本进行数据清洗,获得目标网络脚本;
    通过所述预设提取模型对所述目标网络脚本进行特征提取,获得目标脚本特征。
  11. 如权利要求8所述的网页后门检测设备,其特征在于,所述网页后门检测可读指令被所述处理器执行时还实现以下步骤:
    通过网关获取待检测网络脚本,对所述待检测网络脚本进行分析,提取出多个预设维度的特征;
    将所述预设维度的特征与预设网页后门规则进行匹配。
  12. 如权利要求8所述的网页后门检测设备,其特征在于,所述网页后门检测可读指令被所述处理器执行时还实现以下步骤:
    获取第一数量的样本网页后门,将所述样本网页后门通过所述预设提取模型进行特征提取,获得第二脚本特征;
    通过所述预设检测模型对所述第二脚本特征进行检测,获得评估检测结果,所述评估检测结果包括所述第二脚本特征是网页后门对应的特征的第一检测结果;
    统计所述第一检测结果的第二数量,根据所述第一数量和所述第二数量计算所述预设检测模型的准确率;
    在所述准确率超过预设阈值时,执行所述通过预设检测模型对所述目标脚本特征进行检测,获得目标检测结果的步骤。
  13. 如权利要求12所述的网页后门检测设备,其特征在于,所述评估检测结果包括所述第二脚本特征不是网页后门对应的特征的第二检测结果;
    所述网页后门检测可读指令被所述处理器执行时还实现以下步骤:
    获取所述第二检测结果对应的第二脚本特征作为误测网页后门特征;
    设置所述误测网页后门特征的真实检测结果为所述误测网页后门特征是网页后门对应的特征,根据所述误测网页后门特征与对应的真实检测结果对所述预设检测模型进行训练。
  14. 一种存储介质,其特征在于,所述存储介质上存储有网页后门检测可读指令,所述网页后门检测可读指令被处理器执行时实现以下步骤:
    获取待检测网络脚本,将所述待检测网络脚本与预设网页后门规则进行匹配;
    若匹配失败,则通过预设提取模型对所述待检测网络脚本进行特征提取,获得目标脚本特征;
    通过预设检测模型对所述目标脚本特征进行检测,获得目标检测结果。
  15. 如权利要求14所述的存储介质,其特征在于,所述网页后门检测可读指令被处理器执行时还实现以下步骤:
    若所述目标检测结果为所述目标脚本特征是网页后门对应的特征,则根据所述目标脚本特征与对应的所述目标检测结果对所述预设检测模型进行训练。
  16. 如权利要求15所述的存储介质,其特征在于,所述网页后门检测可读指令被处理器执行时还实现以下步骤:
    若匹配失败,则对所述待检测网络脚本进行数据清洗,获得目标网络脚本;
    通过所述预设提取模型对所述目标网络脚本进行特征提取,获得目标脚本特征。
  17. 如权利要求14所述的存储介质,其特征在于,所述网页后门检测可读指令被处理器执行时还实现以下步骤:
    通过网关获取待检测网络脚本,对所述待检测网络脚本进行分析,提取出多个预设维度的特征;
    将所述预设维度的特征与预设网页后门规则进行匹配。
  18. 如权利要求14所述的存储介质,其特征在于,所述网页后门检测可读指令被处理器执行时还实现以下步骤:
    获取第一数量的样本网页后门,将所述样本网页后门通过所述预设提取模型进行特征提取,获得第二脚本特征;
    通过所述预设检测模型对所述第二脚本特征进行检测,获得评估检测结果,所述评估检测结果包括所述第二脚本特征是网页后门对应的特征的第一检测结果;
    统计所述第一检测结果的第二数量,根据所述第一数量和所述第二数量计算所述预设检测模型的准确率;
    在所述准确率超过预设阈值时,执行所述通过预设检测模型对所述目标脚本特征进行检测,获得目标检测结果的步骤。
  19. 如权利要求18所述的存储介质,其特征在于,所述评估检测结果包括所述第二脚本特征不是网页后门对应的特征的第二检测结果;
    所述网页后门检测可读指令被处理器执行时还实现以下步骤:
    获取所述第二检测结果对应的第二脚本特征作为误测网页后门特征;
    设置所述误测网页后门特征的真实检测结果为所述误测网页后门特征是网页后门对应的特征,根据所述误测网页后门特征与对应的真实检测结果对所述预设检测模型进行训练。
  20. 一种网页后门检测装置,其特征在于,所述网页后门检测装置包括:
    匹配模块,用于获取待检测网络脚本,将所述待检测网络脚本与预设网页后门规则进行匹配;
    提取模块,用于若匹配失败,则通过预设提取模型对所述待检测网络脚本进行特征提取,获得目标脚本特征;
    检测模块,用于通过预设检测模型对所述目标脚本特征进行检测,获得目标检测结果。
PCT/CN2018/122828 2018-10-11 2018-12-21 网页后门检测方法、设备、存储介质及装置 WO2020073494A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811188296.9A CN109657459A (zh) 2018-10-11 2018-10-11 网页后门检测方法、设备、存储介质及装置
CN201811188296.9 2018-10-11

Publications (1)

Publication Number Publication Date
WO2020073494A1 true WO2020073494A1 (zh) 2020-04-16

Family

ID=66110701

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/122828 WO2020073494A1 (zh) 2018-10-11 2018-12-21 网页后门检测方法、设备、存储介质及装置

Country Status (2)

Country Link
CN (1) CN109657459A (zh)
WO (1) WO2020073494A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113839904A (zh) * 2020-06-08 2021-12-24 北京梆梆安全科技有限公司 基于智能网联汽车的安全态势感知方法和***

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110232277A (zh) * 2019-04-23 2019-09-13 平安科技(深圳)有限公司 网页后门的检测方法、装置和计算机设备
CN111695117B (zh) * 2020-06-12 2023-10-03 国网浙江省电力有限公司信息通信分公司 一种webshell脚本检测方法及装置
CN111800405A (zh) * 2020-06-29 2020-10-20 深信服科技股份有限公司 检测方法及检测设备、存储介质
CN112182561B (zh) * 2020-09-24 2024-04-30 百度在线网络技术(北京)有限公司 一种后门的检测方法、装置、电子设备和介质
CN112769840B (zh) * 2021-01-15 2023-04-07 杭州安恒信息技术股份有限公司 一种基于强化学习Dyna框架的网络攻击行为识别方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101069175A (zh) * 2003-10-03 2007-11-07 考维枸有限公司 动态消息过滤
CN107294982A (zh) * 2017-06-29 2017-10-24 深信服科技股份有限公司 网页后门检测方法、装置及计算机可读存储介质
CN107451476A (zh) * 2017-07-21 2017-12-08 上海携程商务有限公司 基于云平台的网页后门检测方法、***、设备及存储介质
CN107622202A (zh) * 2017-09-20 2018-01-23 杭州安恒信息技术有限公司 网页后门检测方法及装置
US20180082063A1 (en) * 2016-09-16 2018-03-22 Rapid7, Inc. Web shell detection

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103577755A (zh) * 2013-11-01 2014-02-12 浙江工业大学 一种基于支持向量机的恶意脚本静态检测方法
CN104618343B (zh) * 2015-01-06 2018-11-09 中国科学院信息工程研究所 一种基于实时日志的网站威胁检测的方法及***
CN106961419B (zh) * 2017-02-13 2020-04-14 深信服科技股份有限公司 WebShell检测方法、装置及***

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101069175A (zh) * 2003-10-03 2007-11-07 考维枸有限公司 动态消息过滤
US20180082063A1 (en) * 2016-09-16 2018-03-22 Rapid7, Inc. Web shell detection
CN107294982A (zh) * 2017-06-29 2017-10-24 深信服科技股份有限公司 网页后门检测方法、装置及计算机可读存储介质
CN107451476A (zh) * 2017-07-21 2017-12-08 上海携程商务有限公司 基于云平台的网页后门检测方法、***、设备及存储介质
CN107622202A (zh) * 2017-09-20 2018-01-23 杭州安恒信息技术有限公司 网页后门检测方法及装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113839904A (zh) * 2020-06-08 2021-12-24 北京梆梆安全科技有限公司 基于智能网联汽车的安全态势感知方法和***
CN113839904B (zh) * 2020-06-08 2023-08-22 北京梆梆安全科技有限公司 基于智能网联汽车的安全态势感知方法和***

Also Published As

Publication number Publication date
CN109657459A (zh) 2019-04-19

Similar Documents

Publication Publication Date Title
WO2020073494A1 (zh) 网页后门检测方法、设备、存储介质及装置
WO2020253034A1 (zh) 客户端埋点测试方法、装置、设备及存储介质
WO2020034526A1 (zh) 保险录音的质检方法、装置、设备和计算机存储介质
WO2020015064A1 (zh) ***故障处理方法、装置、设备及存储介质
WO2020258657A1 (zh) 异常检测方法、装置、计算机设备及存储介质
WO2017213400A1 (en) Malware detection by exploiting malware re-composition variations
WO2020015067A1 (zh) 数据采集方法、装置、设备及存储介质
WO2018205373A1 (zh) 人伤理赔定损费用测算方法、装置、服务器和介质
WO2021072881A1 (zh) 基于对象存储的请求处理方法、装置、设备及存储介质
WO2020253135A1 (zh) 自动化分析方法、用户设备、存储介质及装置
WO2020107762A1 (zh) Ctr预估方法、装置及计算机可读存储介质
WO2020015060A1 (zh) 用电量异常评估方法、装置、设备和计算机存储介质
WO2013169059A1 (ko) 웹 서비스 모니터링 시스템 및 방법
WO2020062644A1 (zh) Json劫持漏洞的检测方法、装置、设备及存储介质
WO2020087981A1 (zh) 风控审核模型生成方法、装置、设备及可读存储介质
WO2020253116A1 (zh) 数据跑批方法、装置、存储介质及集群中的成员主机
WO2020233060A1 (zh) 事件通知方法、事件通知服务器、存储介质及装置
WO2021027143A1 (zh) 信息推送方法、装置、设备及计算机可读存储介质
WO2020119115A1 (zh) 数据审核方法、装置、设备及存储介质
WO2015194829A2 (ko) 인터넷 접속 요청을 하는 클라이언트 단말의 인터넷 접속 요청 트래픽으로부터 동일한 공인 ip를 이용하는 사설 네트워크상의 복수개의 클라이언트 단말 중에서 추가 비지정 도메인 네임을 구비한 웹서버에 의해 선별된 디바이스의 대수를 검출하는 방법 및 공인 ip 공유 상태의 디바이스의 선별적인 검출 시스템
WO2019054613A1 (ko) 바이너리 파일에 기초하여 오픈소스 소프트웨어 패키지를 식별하는 방법 및 시스템
WO2020233089A1 (zh) 测试用例生成方法、装置、终端及计算机可读存储介质
WO2020082766A1 (zh) 输入法的联想方法、装置、设备及可读存储介质
WO2016064024A1 (ko) 이상 접속 검출 장치 및 방법
WO2020006886A1 (zh) 门禁***的识别方法、装置、门禁***及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18936405

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 14/07/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18936405

Country of ref document: EP

Kind code of ref document: A1