CN112016097A - Method for predicting time of network security vulnerability being utilized - Google Patents

Method for predicting time of network security vulnerability being utilized Download PDF

Info

Publication number
CN112016097A
CN112016097A CN202010889524.6A CN202010889524A CN112016097A CN 112016097 A CN112016097 A CN 112016097A CN 202010889524 A CN202010889524 A CN 202010889524A CN 112016097 A CN112016097 A CN 112016097A
Authority
CN
China
Prior art keywords
time
network security
security vulnerability
data
vulnerability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010889524.6A
Other languages
Chinese (zh)
Other versions
CN112016097B (en
Inventor
殷娇
游明山
雷丽
安建梅
彭玲玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Lingcheng Technology Co ltd
Original Assignee
Chongqing University of Arts and Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Arts and Sciences filed Critical Chongqing University of Arts and Sciences
Priority to CN202010889524.6A priority Critical patent/CN112016097B/en
Publication of CN112016097A publication Critical patent/CN112016097A/en
Application granted granted Critical
Publication of CN112016097B publication Critical patent/CN112016097B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Business, Economics & Management (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Quality & Reliability (AREA)
  • Development Economics (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Tourism & Hospitality (AREA)
  • Game Theory and Decision Science (AREA)
  • General Business, Economics & Management (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Computer And Data Communications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of computer network security, and particularly discloses a method for predicting the time of network security vulnerability exploitation, which comprises the following steps: the original data d related to the network security vulnerability acquired at the moment t(t)Obtaining characteristic data x through data preprocessing, characteristic extraction and characteristic selection(t)(ii) a The characteristic data x(t)Classifier model f by time t(t)(v) predicting to obtain the predicted value of the time of the network security vulnerability being used
Figure DDA0002656505220000011
Acquiring a real utilized time label y corresponding to the network security vulnerability(t)(ii) a Calculating the non-equilibrium factors corresponding to each category to which the utilized time of the network security vulnerability in the current sliding window belongs through a sliding window non-equilibrium factor algorithm
Figure DDA0002656505220000012
And class weight
Figure DDA0002656505220000013
From the characteristic data x(t)And time stamp of use y(t)And class weight
Figure DDA0002656505220000014
Retraining classifier model f(t)(. to) and updating the parameters to obtain the classifier model f at the time t +1(t+1)(. cndot.). The technical scheme of the invention can be used for improving the performance of the prediction model of the utilized time of the network security vulnerability under the condition of dynamic imbalance of the data types.

Description

Method for predicting time of network security vulnerability being utilized
Technical Field
The invention relates to the technical field of computer network security technology, in particular to a method for predicting the time of network security vulnerability being utilized.
Background
A network vulnerability is generally understood as a defect in the specific implementation of hardware, software, protocols, etc. or in the security policy of a system, thereby enabling an attacker to access or destroy the system without authorization. Network vulnerabilities can affect a wide range of software and hardware devices, including the system itself and its supporting software, network client and server software, network routers and security firewalls, etc. Different security loopholes exist between different types of software and hardware equipment, between different versions of the same equipment, between different systems composed of different equipment, and under different setting conditions of the same system.
The vulnerability problem is closely time dependent. The system continuously exposes the bugs existing in the system from the day of release along with the deep use of the user, and the bugs discovered earlier are continuously repaired by patch software released by a system supplier or corrected in a new version of the system released later. While the new version system corrects the holes in the old version, some new holes and errors are introduced. Therefore, as time goes on, old vulnerabilities can be continuously repaired by means of patching and the like, and new vulnerabilities can continuously appear.
Because resources such as manpower, fund, technology and the like of software developers, network security experts, system maintainers and the like are limited, in the face of increasing network security vulnerabilities, all vulnerabilities cannot be repaired in time, and only vulnerabilities which are most vulnerable and are attacked at the earliest can be selected for repairing. Therefore, predicting the time that the network security vulnerability can be utilized is very important for network security vulnerability management, and can help a decision maker to find out the vulnerability which is possibly attacked earliest, so that a repair tool such as a patch is provided for the vulnerability, and the loss caused by the network security vulnerability is reduced to the minimum.
One challenge faced in predicting the time of exploitation of a network security vulnerability is that dynamic migration phenomenon exists in statistical characteristics of network security vulnerability data, that is, the functional relationship y (f) (x) between the characteristic x of the data and the time stamp y of exploitation changes with the passage of time and the appearance of new samples. Therefore, the time to be exploited of future vulnerabilities cannot be predicted by relying solely on static predictive models trained from fixed and unchanging historical data. Instead, an online learning mode is adopted, and the prediction model is continuously retrained and parameters are updated by newly appeared vulnerability samples.
Another challenge that network security vulnerabilities face with temporal prediction is the problem of class dynamic imbalance.
Figure BDA0002656505200000023
Is an n-dimensional feature vector, y, in a feature space X acquired at time t(t)∈{c[1],c[2],…,c[N]Are corresponding to the characteristic data x(t)A label of c[1],c[2],…,c[N]The classification problem data are corresponding to N types, and N is more than or equal to 2. (x)(t),y(t)) Referred to as a labeled sample. The dynamic imbalance of class is the class c[1],c[2],…,c[N]The proportion of the number of samples in the total number of samples is different, and the proportion of each type is dynamically changed along with the change of time. For a network security vulnerability exploited time prediction problem, exploited time tags include, but are not limited to, y ∈ { c ∈ }[1]Before 'vulnerability publication', c[2]On 'vulnerability publishing day', c[3]Within one month ' after publication of ' vulnerability ' c[4]Within one year' after public release of a vulnerability, c[5]Neither will 'be utilized ever'. The real historical data shows that the network security vulnerability belongs to the category c[k]The proportion of the number of samples corresponding to (k ═ 1,2, …, N) to the total number of samples is unbalanced, the number of samples corresponding to some classes is large, the number of samples corresponding to some classes is small, and the unbalanced state changes dynamically with the passage of time. The traditional classifier model based on data driving only has good modeling effect on data with sample balance. When the samples are unbalanced, the classes with a large number of samples need to be down-sampled or the classes with a small number of samples need to be up-sampled according to the unbalanced state of the samples, so that the purpose of sample equalization is achieved.
One of the existing solutions for sample non-equalization is shown in equation (3),
Figure BDA0002656505200000021
the result calculated by the method shown in equation (3) is the proportion of each class in all samples up to the current time t, and the global non-equilibrium factor of each class is calculated. The imbalance factor calculated by the method is insensitive to the imbalance state existing in the new data due to the influence of the old data. When the unbalanced state of the sample dynamically changes, the method cannot reflect the latest unbalanced state in time.
In order to reduce the influence of old data, on the basis of (3), a time delay coefficient theta is introduced by an improved algorithm, and an imbalance factor is calculated according to (4).
Figure BDA0002656505200000022
The calculation method in the formula (4) assumes that the time delay coefficients θ at all times are the same, does not fundamentally solve the problem of dynamic change of the non-equilibrium state, and still cannot capture the sample non-equilibrium state in the latest period of time in real time.
Disclosure of Invention
In order to solve the technical problem that the performance of a prediction model is gradually deteriorated along with the time due to concept drift and sample data dynamic imbalance in the prediction problem of the time of the network security vulnerability being utilized, the invention provides a method for predicting the time of the network security vulnerability being utilized.
The basic scheme of the invention is as follows:
a method for predicting the time of using the network security vulnerability includes the step S1 of obtaining the original data d related to the network security vulnerability at the time t(t)Wherein t is 1,2,3, …; step S2, for the acquired raw data d at time t(t)Sequentially carrying out data preprocessing, feature extraction and feature selection to obtain feature data x(t)(ii) a Further comprising the steps of:
step S3, the characteristic data x(t)Classifier model f by time t(t)(v) predicting to obtain the predicted value of the time of the network security vulnerability being used, which is obtained at the time t
Figure BDA0002656505200000038
For downstream applications, wherein
Figure BDA0002656505200000031
Step S4, acquiring a real utilized time label y corresponding to the network security vulnerability at the time t(t)Wherein, y(t)∈{c[1],c[2],…,c[k],…c[N]},c[1],c[2],…,c[k],…c[N]Display netN categories to which the utilized time of the network security vulnerability belongs represent the total number of the categories of the utilized time of the network security vulnerability, wherein N is more than or equal to 2;
step S5, calculating the non-equilibrium factors corresponding to each category to which the utilized time of the network security vulnerability in the current sliding window belongs through the sliding window non-equilibrium factor algorithm
Figure BDA0002656505200000032
And class weight
Figure BDA0002656505200000033
Wherein k is 1,2, …, N;
step S6, according to the characteristic data x of the network security vulnerability at the time t(t)And time stamp of use y(t)Class weight
Figure BDA0002656505200000034
Retraining the current classifier model f(t)(. to) and updating the parameters to obtain the classifier model f at the time t +1(t+1)(·)。
The basic scheme has the beneficial effects that: 1. according to the technical scheme, the classifier model is trained in an online learning mode by using the original data of the network security vulnerability which appears historically and the corresponding utilized time label, the utilized time of the network security vulnerability obtained at the current time is predicted, the predicted result can give the approximate time range of the network security vulnerability obtained at the current time, and therefore decision support of resource distribution problems such as time, fund and personnel is provided for network security experts, system developers and maintainers.
2. According to the technical scheme, through a sliding window non-equilibrium factor algorithm, non-equilibrium factors corresponding to various categories to which the utilized time of the network security vulnerability in the latest sliding window belongs are obtained through calculation
Figure BDA0002656505200000035
The category dynamic unbalance state in the network security vulnerability exploited time prediction problem can be tracked in real time. Dynamic disparity based on tracked classIn the technical scheme, the category weight of each category is further calculated by a sliding window non-equilibrium factor algorithm
Figure BDA0002656505200000036
And use the class weight
Figure BDA0002656505200000037
The retraining and parameter updating processes of the classifier model are controlled, so that the problem of model prediction performance reduction caused by the class dynamic unbalance state in the time prediction problem of the network security vulnerability is effectively reduced.
3. According to the technical scheme, each time a network security vulnerability sample with a label is obtained, the characteristic data x of the network security vulnerability is obtained(t)And time stamp of use y(t)And class weight
Figure BDA0002656505200000041
And retraining the classifier model and updating the parameters of the classifier model to realize the online learning of the classifier model. Compared with a static prediction model obtained by training through fixed and unchangeable historical data in the prior art, the technical scheme can effectively solve the technical problem that the performance of the prediction model is gradually deteriorated along with the time due to concept drift in the prediction problem of the utilized time of the network security vulnerability.
Further, in step S5, the non-equilibrium factor
Figure BDA0002656505200000042
The calculation formula of (A) is as follows;
Figure BDA0002656505200000043
wherein z represents the total number of samples contained in the current sliding window, wherein z is more than or equal to N, c[k](k-1, 2, …, N) represents the kth class to which the time at which the security vulnerability is exploited belongs, the non-equilibrium factor
Figure BDA0002656505200000044
Is represented in the current sliding window, belongs to class c[k]Is proportional to the total number of samples z in the current sliding window, wherein when the characteristic data x(t)The corresponding network security vulnerability exploited time belongs to the category c[k]When is in contact with the substrate [ (x)(t),c[k])]1, otherwise, [ (x)(t),c[k])]=0。
Has the advantages that: compared with the solution of sample imbalance of the equations (3) and (4), the method can track the real-time imbalance factors of each class in z (z ≧ N) samples of the nearest sliding window in real time. The sensitivity of the classifier model to time can be adjusted by adjusting the size of a sliding window z, and the smaller the z is, the more sensitive the z is to time, the more the real-time unbalanced state of each class can be reflected; the larger z is, the less sensitive to time, and the more reflective is the average of the non-equilibrium states of the respective classes over a relatively long period of time.
Further, in step S5, the category weight
Figure BDA0002656505200000045
The calculation formula of (2) is as follows:
Figure BDA0002656505200000046
wherein,
Figure BDA0002656505200000047
is of class c[k]The corresponding non-equalization factor.
Due to the current sample (x)(t),y(t)) Belong to class c[k]Therefore, class c[k]Ratio of occupied (non-equilibrium factor)
Figure BDA0002656505200000048
Must be greater than 0.
Has the advantages that: for the class with smaller sample proportion, the corresponding non-equilibrium factor
Figure BDA0002656505200000049
Smaller, class weights calculated by this method
Figure BDA00026565052000000410
It is larger; similarly, for classes with larger sample ratios, the corresponding non-equalization factors
Figure BDA00026565052000000411
Larger, class weights calculated by this method
Figure BDA00026565052000000412
It is smaller. Class weights calculated by this method
Figure BDA00026565052000000413
The retraining process of the classifier model is adjusted, so that the data of the classes with few samples can be strengthened, and the data of the classes with many samples can be weakened, thereby achieving the purposes of balancing the sample data and improving the performance of the classifier model.
Further, the value of z in the formula (1) can be optimized by any hyper-parameter determination method including random search, grid search and Bayesian optimization algorithm.
Has the advantages that: the size of the sliding window z can be used for adjusting the sensitivity of the classifier model to time, and the smaller the z is, the more sensitive the classifier model to time is, the more the real-time unbalanced state of each class can be reflected; the larger z is, the less sensitive to time, and the more reflective is the average of the non-equilibrium states of the respective classes over a relatively long period of time. And the specific value of z is determined by adopting a hyper-parameter optimization method, so that a user can select the value of z most suitable for engineering application according to the data condition of the user.
Further, the data preprocessing in step S2 includes any one or more general algorithms of data deduplication, outlier detection, regularization, normalization, word segmentation, and unique hot coding.
Has the advantages that: the preprocessing mode of the original data is various, the selection is convenient, and the adaptability is strong.
Further, the original data comprises a vulnerability number, vulnerability description information, vulnerability publishing time and a network security vulnerability multi-dimensional original data combination of security level scores or any one of single-dimensional original data.
Has the advantages that: the scheme can adapt to single-dimensional original data and multi-dimensional original data, and the application range is large.
Further, the feature extraction in step S2 includes applying a general manual feature extraction algorithm or an automatic feature extraction algorithm according to the form and content of the raw data.
Has the advantages that: the method for extracting the features of the original data is various and convenient to select.
Further, the feature selection includes a principal component analysis method, a correlation coefficient method, and a recursive feature elimination method.
Has the advantages that: the feature selection is diverse in manner and convenient to select according to downstream applications.
Further, the classifier models used in predicting the time when the network security vulnerability is utilized include, but are not limited to, a full-connection neural network algorithm, a convolution neural network algorithm and a cyclic neural network algorithm, and when t is 1, f(1)The (·) model parameters can be initialized randomly or with a known pre-trained model.
Has the advantages that: the classifier models are diverse and can be conveniently selected according to downstream applications. When there is no historical data accumulated, f(1)Random initialization is adopted, so that the technical scheme can be cold started without historical data. When partial historical data exist, the known pre-training model is adopted for initialization, so that the technical scheme can effectively utilize the existing historical data and improve the performance of the algorithm at the initial running stage.
Further, the original data d of the network security vulnerability at the moment of t +1(t+1)Before being acquired, the network security vulnerability not acquired at the time t is utilized by the time tag y(t)When so, skip steps S5 and S6, let f(t+1)(·)=f(t)And (t) predicting the time at which the network security vulnerability at the time t +1 is exploited.
Has the advantages that: in the process of practical applicationIn (1), only when x(t)After the corresponding network security vulnerability is really utilized, the real label y of the utilization time of the network security vulnerability can be obtained(t). When x is(t)If the corresponding network security hole is not really utilized and a new network security hole of t +1 needs to be predicted, let f(t+1)(·)=f(t)Can ensure that the algorithm can still effectively run.
Drawings
FIG. 1 is a flow diagram of an embodiment of a method for predicting a time at which a network security vulnerability is exploited;
FIG. 2 is a diagram of an experimental result of an embodiment of a method for predicting a time at which a network security vulnerability is exploited;
FIG. 3 is a diagram of experimental results of an embodiment of a method for predicting time at which a network security vulnerability is exploited;
FIG. 4 is a diagram of experimental results of an embodiment of a method for predicting time at which a network security vulnerability is exploited;
FIG. 5 is a graph of experimental results of an embodiment of a method for predicting time at which a network security vulnerability is exploited;
fig. 6 is a diagram of an experimental result of an embodiment of a method for predicting a time at which a network security vulnerability is exploited.
Detailed Description
The following is further detailed by way of specific embodiments:
examples
A method for predicting a time at which a network security vulnerability is exploited, as shown in fig. 1, includes:
step S1, obtaining original data d related to the network security vulnerability at the time t(t)Where t is 1,2,3, …, raw data d(t)The method comprises the steps of obtaining a network security vulnerability multi-dimensional raw data combination or any one of the network security vulnerability multi-dimensional raw data combination, such as vulnerability numbers, vulnerability description information, vulnerability release time and security level scores. In this embodiment, the original data d(t)The vulnerability discovery method comprises vulnerability numbers, vulnerability description information, vulnerability release time and measurement indexes and scores provided in a general vulnerability assessment system CVSS 2.0.
Step S2, for the acquired raw data d at time t(t)Sequentially carrying out pretreatment, feature extraction and feature selection to obtain feature data x(t). Wherein the data preprocessing comprises any one or combination of more general algorithms including data deduplication, outlier detection, regularization, normalization, word segmentation, and one-hot encoding. The feature extraction includes using a general manual feature extraction algorithm or an automatic feature extraction algorithm according to the form and content of the original data. The feature selection comprises general feature extraction algorithms such as a principal component analysis method, a correlation coefficient method, a recursive feature elimination method and the like.
In this embodiment, step S2 specifically includes the following steps:
step S201, using vulnerability description information as numerical characteristics of a first part, extracting semantic characteristics by using a BERT deep learning model in a natural language processing technology, converting natural language data such as description information into high-dimensional numerical characteristics, and then using a principal component analysis method as a characteristic selection method to reduce dimensions and select characteristic data of the first part with 10 dimensions;
step S202, taking other multi-dimensional original data except the vulnerability description information as numerical characteristics of a second part, converting non-numerical characteristics into numerical characteristics according to one-hot coding, then carrying out normalization processing on the numerical characteristics, and then adopting a principal component analysis method as a characteristic selection method to reduce the dimension and select the 10-dimensional characteristic data of the second part.
Step S203 splices the 10-dimensional feature data selected by the first part and the 10-dimensional feature data selected by the second part into 20-dimensional feature data x(t)
Step S3, the characteristic data x(t)Classifier model f by time t(t)(v) predicting to obtain the predicted value of the time of the network security vulnerability being used, which is obtained at the time t
Figure BDA0002656505200000071
The prediction result can provide decision support for resource allocation problems such as time, funds, personnel and the like for network security experts, system developers and maintainers. In addition, predictionClassifier model f used by network security vulnerability utilization time(t)(. cndot.) can be any machine learning algorithm capable of dealing with classification problems, including but not limited to fully-connected neural network algorithms, convolutional neural network algorithms and modifications thereof, cyclic neural network algorithms and modifications thereof, and the like, where when t is 1, f is(1)The (·) model parameters can be initialized randomly or with a known pre-trained model. In this embodiment, the classifier model is a three-layer fully-connected neural network model, in which the number of neurons in the input layer is equal to x(t)I.e. 20; the number of neurons in the output layer is equal to the number of classes N, in this embodiment, N is 3; f. of(1)() And initializing by adopting a random initialization method.
Step S4, acquiring a real utilized time label y corresponding to the network security vulnerability at the time t(t)Is utilized with time tag y(t)For the characteristic data x(t)The corresponding network security vulnerability is exploited with the true value of the category to which the time belongs. Wherein, y(t)∈{c[1],c[2],…,c[k],…c[N]},c[1],c[2],…,c[k],…c[N]The classification of the classification can be customized by people. In this embodiment, the time at which the network security vulnerability is exploited is classified into 3 classes, i.e., N ═ 3, specifically, y(t)∈{c[1]Before 'vulnerability publication', c[2]On 'vulnerability publishing day', c[3]After 'vulnerability publication' }.
Step S5, calculating through a sliding window non-equilibrium factor algorithm to obtain non-equilibrium factors corresponding to each category to which the utilized time of the network security vulnerability in the current sliding window belongs
Figure BDA0002656505200000072
And class weight
Figure BDA0002656505200000073
Where k is 1,2, …, N.
Wherein the non-equilibrium factor
Figure BDA0002656505200000074
The calculation formula of (A) is as follows;
Figure BDA0002656505200000081
in the embodiment, the value of z is optimized by a hyper-parameter determination method such as grid search, and then the value of z is taken as 50. c. C[k](k-1, 2, …, N) represents the kth category to which the time at which the security vulnerability is exploited belongs. Non-equilibrium factor
Figure BDA0002656505200000082
Is represented in the current sliding window, belongs to class c[k]The ratio of the sample data in the current sliding window to the total number of samples z, in formula (1), when the characteristic data x(t)The corresponding network security vulnerability exploited time belongs to the category c[k]When is in contact with the substrate [ (x)(t),c[k])]1, otherwise, [ (x)(t),c[k])]0. And due to the current sample data (x)(t),y(t)) Belong to class c[k]Therefore, class c[k]Ratio of occupied (non-equilibrium factor)
Figure BDA0002656505200000083
Must be greater than 0.
Class weight
Figure BDA0002656505200000084
The calculation formula of (2) is as follows:
Figure BDA0002656505200000085
wherein,
Figure BDA0002656505200000086
is of class c[k]The corresponding non-equalization factor.
Step S6, according to the characteristic data x of the network security vulnerability at the time t(t)And time stamp of use y(t)Class weight
Figure BDA0002656505200000087
Retraining the current classifier model f(t)(. to) and updating the parameters to obtain the classifier model f at the time t +1(t+1)(·)。
And in the above steps, the original data d of the network security vulnerability at the moment of t +1(t+1)Before being acquired, the network security vulnerability not acquired at the time t is utilized by the time tag y(t)When so, skip steps S5 and S6, let f(t+1)(·)=f(t)And (t) predicting the time at which the network security vulnerability at the time t +1 is exploited.
The specific implementation process comprises the following steps: in this embodiment, 23302 pieces of vulnerability information recorded in the open source database NVD by the utilized time between 1988 and 2020 is taken for simulation verification. The time of each network security vulnerability being exploited is shown in fig. 2, a dot is a single vulnerability, and the larger the dot, the longer the interval days between the time of the network security vulnerability being exploited and the vulnerability release. The category dynamic unbalance states of the 3 different categories of network security vulnerabilities in this embodiment are shown in fig. 3. In FIG. 3, the legend indicates that 'Neg' represents category c[1]Before' vulnerability publication, 18113 total, static percentage 77.73%; 'ZeroDay' represents class c[2]On the day of 'vulnerability open release', 1312 in total and 5.63% of static percentage; 'Pos' represents class c[3]After 'the vulnerability was released publicly', there were a total of 3877, with a static percentage of 16.64%.
By using the technical scheme, the category 'Neg', the category 'ZeroDay' and the category 'Pos' are subjected to network security vulnerability exploited time prediction performance evaluation respectively. Fig. 4, 5 and 6 show the 4 most widely used performance indicators in the classification problem in the three categories: accuracy, precision, recall, and F1 values over time. Comparing the dynamic unbalanced trend of each category in fig. 3, it can be found that the predicted performance trend of each category is consistent with the dynamic unbalanced state trend of each category in fig. 3, and the predicted classification performance of the utilized time of the network security vulnerability using the technical scheme is obviously superior to the random guess result.
The foregoing is merely an example of the present invention and common general knowledge of known specific structures and features of the embodiments is not described herein in any greater detail. It should be noted that, for those skilled in the art, without departing from the structure of the present invention, several changes and modifications can be made, which should also be regarded as the protection scope of the present invention, and these will not affect the effect of the implementation of the present invention and the practicability of the patent. The scope of the claims of the present application shall be determined by the contents of the claims, and the description of the embodiments and the like in the specification shall be used to explain the contents of the claims.

Claims (10)

1. A method for predicting the time of using the network security vulnerability includes the step S1 of obtaining the original data d related to the network security vulnerability at the time t(t)Wherein t is 1,2,3, …; step S2, for the acquired raw data d at time t(t)Sequentially carrying out data preprocessing, feature extraction and feature selection to obtain feature data x(t)(ii) a The method is characterized by further comprising the following steps:
step S3, the characteristic data x(t)Classifier model f by time t(t)(v) predicting to obtain the predicted value of the time of the network security vulnerability being used, which is obtained at the time t
Figure FDA0002656505190000011
For downstream applications, wherein
Figure FDA0002656505190000012
Step S4, when t is acquiredReal utilized time label y corresponding to carved network security vulnerability(t)Wherein, y(t)∈{c[1],c[2],…,c[k],…c[N]},c[1],c[2],…,c[k],…c[N]Representing N categories to which the utilized time of the network security vulnerability belongs, wherein N represents the total number of the categories of the utilized time of the network security vulnerability, and N is more than or equal to 2;
step S5, calculating the non-equilibrium factors corresponding to each category to which the utilized time of the network security vulnerability in the current sliding window belongs through the sliding window non-equilibrium factor algorithm
Figure FDA0002656505190000013
And class weight
Figure FDA0002656505190000014
Wherein k is 1,2, …, N;
step S6, according to the characteristic data x of the network security vulnerability at the time t(t)And time stamp of use y(t)Class weight
Figure FDA0002656505190000015
Retraining the current classifier model f(t)(. to) and updating the parameters to obtain the classifier model f at the time t +1(t+1)(·)。
2. The method of claim 1, wherein the method comprises the following steps: in step S5, the imbalance factor
Figure FDA0002656505190000016
The calculation formula of (2) is as follows:
Figure FDA0002656505190000017
wherein z represents the total number of samples contained in the current sliding window, wherein z is more than or equal to N, c[k](k 1, 2.., N) represents the kth class to which the time at which the security vulnerability is exploited belongs, and a non-equilibrium factor
Figure FDA0002656505190000018
Is represented in the current sliding window, belongs to class c[k]The ratio of the sample data in the current sliding window to the total number of samples z, wherein, when the characteristic data x(t)The corresponding network security vulnerability exploited time belongs to the category c[k]When is in contact with the substrate [ (x)(t),c[k])]1, otherwise, [ (x)(t),c[k])]=0。
3. The method for predicting the time at which the network security vulnerability is exploited according to claim 2, wherein: in step S5, the category weight
Figure FDA0002656505190000019
The calculation formula of (2) is as follows:
Figure FDA00026565051900000110
wherein,
Figure FDA00026565051900000111
is of class c[k]The corresponding non-equalization factor.
4. The method for predicting the time at which the network security vulnerability is exploited according to claim 2, wherein: the value of z in the formula (1) is optimized by any hyper-parameter determination method including random search, grid search and Bayesian optimization algorithm.
5. The method of claim 1, wherein the method comprises the following steps: the data preprocessing in step S2 includes any one or a combination of general algorithms including data deduplication, outlier detection, regularization, normalization, word segmentation, and unique hot coding.
6. The method of claim 1, wherein the method comprises the following steps: raw data d(t)The network security vulnerability multi-dimensional original data combination comprises vulnerability numbers, vulnerability description information, vulnerability release time and security level scores or any one of the network security vulnerability multi-dimensional original data combination.
7. The method of claim 1, wherein the method comprises the following steps: the feature extraction in step S2 includes applying a general manual feature extraction algorithm or an automatic feature extraction algorithm according to the form and content of the raw data.
8. The method of claim 1, wherein the method comprises the following steps: the feature selection includes a principal component analysis method, a correlation coefficient method, and a recursive feature elimination method.
9. The method of claim 1, wherein the method comprises the following steps: the classifier models used in predicting the time when the network security vulnerability is utilized include, but are not limited to, a full-connection neural network algorithm, a convolution neural network algorithm and a cyclic neural network algorithm, and when t is 1, f(1)The (·) model parameters can be initialized randomly or with a known pre-trained model.
10. The method of claim 1, wherein the method comprises the following steps: original data d of network security vulnerability at time t +1(t+1)Before being acquired, the network security vulnerability not acquired at the time t is utilized by the time tag y(t)When so, skip steps S5 and S6, let f(t+1)(·)=f(t)And (t) predicting the time at which the network security vulnerability at the time t +1 is exploited.
CN202010889524.6A 2020-08-28 2020-08-28 Method for predicting network security vulnerability time to be utilized Active CN112016097B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010889524.6A CN112016097B (en) 2020-08-28 2020-08-28 Method for predicting network security vulnerability time to be utilized

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010889524.6A CN112016097B (en) 2020-08-28 2020-08-28 Method for predicting network security vulnerability time to be utilized

Publications (2)

Publication Number Publication Date
CN112016097A true CN112016097A (en) 2020-12-01
CN112016097B CN112016097B (en) 2024-02-27

Family

ID=73503285

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010889524.6A Active CN112016097B (en) 2020-08-28 2020-08-28 Method for predicting network security vulnerability time to be utilized

Country Status (1)

Country Link
CN (1) CN112016097B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113792300A (en) * 2021-11-17 2021-12-14 山东云天安全技术有限公司 System for predicting industrial control network bugs based on internet and industrial control network bug parameters
CN114021149A (en) * 2021-11-17 2022-02-08 山东云天安全技术有限公司 System for predicting industrial control network bugs based on correction parameters
CN114329500A (en) * 2022-03-09 2022-04-12 山东卓朗检测股份有限公司 Server cluster security vulnerability detection method based on artificial intelligence
CN116980065A (en) * 2023-08-17 2023-10-31 辽宁天衡智通防务科技有限公司 Clock calibration method, clock calibration device, terminal equipment and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001090921A2 (en) * 2000-05-25 2001-11-29 Kanisa, Inc. System and method for automatically classifying text
CN102185735A (en) * 2011-04-26 2011-09-14 华北电力大学 Network security situation prediction method
CN104809226A (en) * 2015-05-07 2015-07-29 武汉大学 Method for early classifying imbalance multi-variable time sequence data
CN109347801A (en) * 2018-09-17 2019-02-15 武汉大学 A kind of vulnerability exploit methods of risk assessment based on multi-source word insertion and knowledge mapping
CN110018670A (en) * 2019-03-28 2019-07-16 浙江大学 A kind of industrial process unusual service condition prediction technique excavated based on dynamic association rules
WO2019150343A1 (en) * 2018-02-05 2019-08-08 Telefonaktiebolaget Lm Ericsson (Publ) Resource needs prediction in virtualized systems: generic proactive and self-adaptive solution
CN110109969A (en) * 2019-04-16 2019-08-09 公安部第三研究所 A kind of integrated data stream method for digging and system for the unbalanced application of class
CN110321940A (en) * 2019-06-24 2019-10-11 清华大学 The feature extraction of aircraft telemetry and classification method and device
CN110636020A (en) * 2019-08-05 2019-12-31 北京大学 Neural network equalization method for adaptive communication system
CN111401808A (en) * 2020-03-12 2020-07-10 重庆文理学院 Material agreement inventory demand prediction method based on hybrid model

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001090921A2 (en) * 2000-05-25 2001-11-29 Kanisa, Inc. System and method for automatically classifying text
CN102185735A (en) * 2011-04-26 2011-09-14 华北电力大学 Network security situation prediction method
CN104809226A (en) * 2015-05-07 2015-07-29 武汉大学 Method for early classifying imbalance multi-variable time sequence data
WO2019150343A1 (en) * 2018-02-05 2019-08-08 Telefonaktiebolaget Lm Ericsson (Publ) Resource needs prediction in virtualized systems: generic proactive and self-adaptive solution
CN109347801A (en) * 2018-09-17 2019-02-15 武汉大学 A kind of vulnerability exploit methods of risk assessment based on multi-source word insertion and knowledge mapping
CN110018670A (en) * 2019-03-28 2019-07-16 浙江大学 A kind of industrial process unusual service condition prediction technique excavated based on dynamic association rules
CN110109969A (en) * 2019-04-16 2019-08-09 公安部第三研究所 A kind of integrated data stream method for digging and system for the unbalanced application of class
CN110321940A (en) * 2019-06-24 2019-10-11 清华大学 The feature extraction of aircraft telemetry and classification method and device
CN110636020A (en) * 2019-08-05 2019-12-31 北京大学 Neural network equalization method for adaptive communication system
CN111401808A (en) * 2020-03-12 2020-07-10 重庆文理学院 Material agreement inventory demand prediction method based on hybrid model

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
HENRY G. R. GOUK,ANTHONY M. BLAKE: "Fast Sliding Window Classification with Convolutional Neural Networks", 《PROCEEDINGS OF THE 29TH INTERNATIONAL CONFERENCE ON IMAGE AND VISION COMPUTING NEW ZEALAND》, pages 114 - 118 *
刘欣: "基于类别非平衡时序数据批的企业财务困境预测动态建模研究", 《中国优秀硕士学位论文全文数据库 经济与管理科学辑》, no. 02, pages 152 - 3485 *
孙艳歌: "概念漂移数据流分类算法研究", 《中国博士学位论文全文数据库 信息科技辑》, no. 01, pages 138 - 41 *
赵强利,蒋艳凰: "类别严重不均衡应用的在线数据流学习算法", 《计算机科学》, no. 06, pages 255 - 259 *
雷丽: "基于ASP.NET学生选课***的设计与实现", 《重庆文理学院学报 自然科学版》, no. 2, pages 72 - 74 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113792300A (en) * 2021-11-17 2021-12-14 山东云天安全技术有限公司 System for predicting industrial control network bugs based on internet and industrial control network bug parameters
CN114021149A (en) * 2021-11-17 2022-02-08 山东云天安全技术有限公司 System for predicting industrial control network bugs based on correction parameters
CN113792300B (en) * 2021-11-17 2022-02-11 山东云天安全技术有限公司 System for predicting industrial control network bugs based on internet and industrial control network bug parameters
CN114021149B (en) * 2021-11-17 2022-06-03 山东云天安全技术有限公司 System for predicting industrial control network bugs based on correction parameters
CN114329500A (en) * 2022-03-09 2022-04-12 山东卓朗检测股份有限公司 Server cluster security vulnerability detection method based on artificial intelligence
CN116980065A (en) * 2023-08-17 2023-10-31 辽宁天衡智通防务科技有限公司 Clock calibration method, clock calibration device, terminal equipment and storage medium
CN116980065B (en) * 2023-08-17 2024-03-19 辽宁天衡智通防务科技有限公司 Clock calibration method, clock calibration device, terminal equipment and storage medium

Also Published As

Publication number Publication date
CN112016097B (en) 2024-02-27

Similar Documents

Publication Publication Date Title
CN112016097A (en) Method for predicting time of network security vulnerability being utilized
US8676726B2 (en) Automatic variable creation for adaptive analytical models
CN111209168A (en) Log sequence anomaly detection framework based on nLSTM-self attention
WO2019223384A1 (en) Feature interpretation method and device for gbdt model
CN111108362A (en) Abnormal sound detection device, abnormal model learning device, abnormal sound detection method, abnormal sound generation device, abnormal data generation device, abnormal sound generation method, and program
CN113312447B (en) Semi-supervised log anomaly detection method based on probability label estimation
CN110659742A (en) Method and device for acquiring sequence representation vector of user behavior sequence
CN111431849B (en) Network intrusion detection method and device
Nuha Training dataset reduction on generative adversarial network
Udayakumar et al. Malware classification using machine learning algorithms
JP7207540B2 (en) LEARNING SUPPORT DEVICE, LEARNING SUPPORT METHOD, AND PROGRAM
Huang et al. Statistical certification of acceptable robustness for neural networks
Kekül et al. A multiclass hybrid approach to estimating software vulnerability vectors and severity score
CN116842520A (en) Anomaly perception method, device, equipment and medium based on detection model
Awad et al. Addressing imbalanced classes problem of intrusion detection system using weighted extreme learning machine
WO2022169954A1 (en) Deep neural network learning with controllable rules
Gao et al. The prediction role of hidden markov model in intrusion detection
US20230039730A1 (en) Software patch risk determination
CN114579761A (en) Information security knowledge entity relation connection prediction method, system and medium
KR20210142443A (en) Method and system for providing continuous adaptive learning over time for real time attack detection in cyberspace
Tatarinova et al. Constructing a Model for the Dynamic Evaluation of Vulnerability in Software Based on Public Sources
Ji et al. An efficient intrusion detection model based on deepFM
Ball et al. A unified approach to anomaly detection
CN116647374B (en) Network flow intrusion detection method based on big data
US11971900B2 (en) Rule-based data transformation using edge computing architecture

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20240126

Address after: 518000 1104, Building A, Zhiyun Industrial Park, No. 13, Huaxing Road, Henglang Community, Longhua District, Shenzhen, Guangdong Province

Applicant after: Shenzhen Hongyue Information Technology Co.,Ltd.

Country or region after: China

Address before: 402160, Honghe Avenue, Yongchuan District, Chongqing, 319

Applicant before: CHONGQING University OF ARTS AND SCIENCES

Country or region before: China

GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20240711

Address after: Room 104, 1st to 5th Floor, Building 17, No. 26 Outer Ring West Road, Fengtai District, Beijing 100070

Patentee after: Beijing Lingcheng Technology Co.,Ltd.

Country or region after: China

Address before: 518000 1104, Building A, Zhiyun Industrial Park, No. 13, Huaxing Road, Henglang Community, Longhua District, Shenzhen, Guangdong Province

Patentee before: Shenzhen Hongyue Information Technology Co.,Ltd.

Country or region before: China