CN108959084B

CN108959084B - Markov vulnerability prediction quantity method based on smoothing method and similarity

Info

Publication number: CN108959084B
Application number: CN201810701155.6A
Authority: CN
Inventors: 高岭; 张晓�; 冯通; 杨旭东; 孙骞; 王海; 郑杰; 赵子鑫
Original assignee: Northwest University
Current assignee: Northwest University
Priority date: 2018-06-29
Filing date: 2018-06-29
Publication date: 2022-03-25
Anticipated expiration: 2038-06-29
Also published as: CN108959084A

Abstract

A Markov vulnerability prediction quantity method based on a smoothing method and similarity is characterized in that a security vulnerability is used as a research object, historical data information of the security vulnerability is inspected to form a vulnerability complete set, and the vulnerability complete set is reasonably divided into a direct prediction set and an indirect prediction set. And then, predicting the direct prediction set by using a Markov method improved by an exponential smoothing method, establishing the relation between the direct prediction set and the indirect prediction set by using a cosine similarity principle, and further predicting the indirect prediction set. And finally, integrating the prediction results of the direct prediction set and the indirect prediction set, and providing a high-accuracy prediction value for workers in related fields.

Description

Markov vulnerability prediction quantity method based on smoothing method and similarity

Technical Field

The invention belongs to the technical field of computer information security, relates to an exponential smoothing method, cosine similarity and Markov algorithm, and particularly relates to a Markov vulnerability prediction quantity method based on the smoothing method and the similarity.

Background

The traditional software engineering subjects think that due to the principles of limited ability or insufficient experience of programmers, unreasonable software development process and the like, software inevitably has defects and hidden dangers. Among them, the defects and hidden dangers related to the security of the computer system are called security holes. With the rapid development of computer science, the attention of various industries to security vulnerabilities is increasing. Therefore, a method for predicting the number of the security vulnerabilities with high accuracy is provided for workers in the related field, and the method is significant work.

For quantitative prediction, the conventional method is based on statistical principles, and takes statistical indexes (such as arithmetic mean) as predicted values. Although this method is simple and easy, the correlation between different types of data is not considered, and therefore, it is difficult to obtain an accurate predicted value. The Markov algorithm is a modern prediction method, the algorithm fully considers the interconversion relationship among different types of data, and compared with the traditional method, the accuracy of prediction is greatly improved. However, how to determine the distribution of the state transition matrix more scientifically is a big problem in front of related researchers.

Disclosure of Invention

In order to overcome the above-mentioned deficiencies of the prior art, the present invention provides a method for predicting the number of holes in markov based on a smoothing method and a similarity, which improves the markov algorithm by using an exponential smoothing method and a cosine similarity. Firstly, a divide-and-conquer strategy is adopted, the average level of a whole set of the loopholes is taken as a reference, the loopholes which are closer to the loopholes are divided into a direct prediction set, and the loopholes which are farther from the loopholes are divided into an indirect prediction set. And secondly, investigating data distribution of various loopholes in the direct prediction set, and iteratively improving a state transition matrix and a probability matrix in a Markov method by using an exponential smoothing method to further obtain a predicted value of the direct prediction set. And finally, finding out the most similar vulnerability types for various vulnerabilities in the indirect prediction set in the direct prediction set by using cosine similarity, and carrying out proper scaling transformation by taking the predicted values as references to obtain the predicted values of the vulnerability number in the indirect prediction set.

In order to achieve the purpose, the invention adopts the technical scheme that:

a Markov prediction vulnerability quantity method based on a smoothing method and similarity is characterized by comprising the following steps:

(1) firstly, the historical quantity information of the security vulnerability is inspected to form a vulnerability complete set, and the vulnerability complete set is divided into a direct prediction set and an indirect prediction set

Examining historical quantity information of the security vulnerabilities to form a vulnerability complete set, recording the vulnerability complete set as U, wherein the U comprises all quantity information of n types of vulnerabilities in m time nodes, and recording the quantity information

u_ijRepresenting the number of the ith vulnerability in the jth time node, wherein i is more than or equal to 1 and less than or equal to n, j is more than or equal to 1 and less than or equal to m, and inspecting the average number level of the vulnerability complete set, namely calculating the arithmetic mean of all data in U

And rounding the calculated arithmetic mean down:

note that the i-th vulnerability is U_iAnd (3) inspecting the average quantity level of various loopholes, namely calculating the arithmetic mean of data of each row in the U, and rounding down:

wherein i is more than or equal to 1 and less than or equal to n,

to be provided with

Taking 1 as an initial step length as a reference, p as a parameter for determining the range of the direct prediction set, generally, p is more than 0 and less than 1, and in order to ensure the accuracy of the prediction result, the value range of p is recommended to be more than or equal to 0.5 and less than 1,

the algorithm is as follows: algorithm for dividing vulnerability complete set into direct prediction set and indirect prediction set

Inputting: vulnerability complete set U and parameter p

And (3) outputting: direct prediction set S and indirect prediction set

Order to

sum＝0，

(2) Predicting the number of various vulnerabilities in the direct prediction set:

the method for predicting the number of various vulnerabilities in the direct prediction set mainly comprises the following steps:

1) obtaining an actual state transition matrix

Setting w elements in a direct prediction set S, namely S contains all quantity information of w types of vulnerabilities on m time nodes; note the book

s_ijThe number of the ith vulnerability in the jth time node is represented, wherein i is more than or equal to 1 and less than or equal to w, j is more than or equal to 1 and less than or equal to m,

let Q_tRepresenting the actual state transition matrix from the (t-1) th time node to the t-th time node

q_ijtRepresenting the actual probability of transferring the ith vulnerability into the jth vulnerability from the (t-1) th time node to the tth time node, wherein t is more than 1 and less than or equal to m, i is more than or equal to 1 and less than or equal to w, and j is more than or equal to 1 and less than or equal to w;

wherein d is_ijThe number of times that the number of the ith loopholes is reduced and the number of the jth loopholes is increased is represented, i is more than or equal to 1 and less than or equal to w, j is more than or equal to 1 and less than or equal to w, and j is not equal to i; f. of_ijIndicates the ratio of the number of i-th bugs decreasing and the number of j-th bugs increasing to the number of i-th bugs decreasing, i.e. the number of

1≤i≤w，1≤j≤w，j≠i；q_itRepresents Q_tWherein i is not less than 1 and not more than w;

the algorithm is as follows: determining q_itActual state transition matrix Q from t-1 time node to t time node_tRow i q of_itIs calculated by

Inputting: direct prediction set S, parameter f_ij，

And (3) outputting: q. q.s_it

Wherein, the value of i is 1, 2, … … and w in sequence, and a complete actual state transition matrix Q can be obtained_tTaking the t values of 2, 3, … … and m in sequence to obtain all actual state transition matrixes;

2) obtaining a predicted state transition matrix

Is Q'_tRepresenting the predicted state transition matrix from the (t-1) th time node to the t-th time node

q’_ijtRepresenting the prediction probability of transferring the ith vulnerability into the jth vulnerability from the t-1 th time node to the tth time node, wherein t is more than 1 and less than or equal to m, i is more than or equal to 1 and less than or equal to w, and j is more than or equal to 1 and less than or equal to w;

determining a predicted state transition matrix Q 'by'_t：

When t is 2, Q'_t＝Q_t；

B, when t is more than 2 and less than or equal to m, Q'_tElement q 'of (1)'_ijt(wherein i is more than or equal to 1 and less than or equal to w, and j is more than or equal to 1 and less than or equal to w) is obtained by an exponential smoothing method, namely:

q’_ijt＝αq_ijt+(1-α)q’_ij(t-1)

wherein alpha is more than 0 and less than 1;

determining a predicted state transition matrix Q 'from t-1 time node to t time node'_tThe algorithm of (1) inputs: actual state transition matrix Q_tParameter α, output: prediction state transition matrix Q'_t；

Wherein, the value t is sequentially 2, 3, … … and m, and all prediction state transition matrixes can be obtained;

3) obtaining an actual probability matrix

Let P_tRepresenting the actual probability matrix at the t-th time node; note the book

P_t＝[p_1t p_2t … p_wt]，p_itThe ratio of the ith vulnerability number in the direct prediction set to all vulnerability numbers in the direct prediction set at the tth time node is represented, and the count is recorded

Wherein i is more than or equal to 1 and less than or equal to w, t is more than or equal to 1 and less than or equal to m, the method is executed, and the value of t is 1, 2, … … and m in sequence to obtain all actual probability matrixes;

4) obtaining a prediction probability matrix

Is P'_tRepresenting a prediction probability matrix at the t-th time node; note the book

p’_itThe predicted value of the ratio of the ith vulnerability in the direct prediction set to all vulnerabilities in the direct prediction set at the tth time node is represented, wherein i is more than or equal to 1 and less than or equal to w, t is more than or equal to 1 and less than or equal to m,

determining a prediction probability matrix by:

a is P 'when t is 1'_t＝P_t；

B, when t is more than 1 and less than or equal to m, P'_tOf (1) element p'_itWherein i is more than or equal to 1 and less than or equal to w, and the value is obtained by an exponential smoothing method, namely: p'_it＝αp_it+(1-α)p’_i(t-1)Wherein alpha is more than 0 and less than 1,

determining a predicted probability matrix P 'of a t-th time node'_tThe algorithm of (1) inputs: actual probability matrix P_tParameter α, output: prediction state transition matrix P'_t；

Wherein, the value t is 1, 2, … … and m in sequence, and all prediction probability matrixes can be obtained;

5) obtaining the predicted value of directly predicting the number of various centralized bugs

Setting the actual value of the total number of the loopholes of the direct prediction set at each time node as C, and recording C as [ C ═ C%₁ c₂ … c_mc_m+1]，c_iRepresenting the total number of holes of the ith time node, i.e.

Wherein i is more than or equal to 1 and less than or equal to m;

if the predicted value of the total number of vulnerabilities of the direct prediction set at each time node is C ', C ═ C'₁ c’₂ … c’_mc’_m+1]C 'is determined by'_iWherein i is more than or equal to 1 and less than or equal to m:

(c 'when i is 1)'_i＝c_i；

C 'when i is more than 1 and less than or equal to m + 1'_iObtained by exponential smoothing, namely:

wherein alpha is more than 0 and less than 1,

then c'_m+1The prediction value of the total amount of the loopholes of the m +1 th time node of the direct prediction set is obtained;

obtaining a prediction state transition matrix Q 'from 2)'_m(ii) a Obtaining a prediction probability matrix P 'from 4)'_m；

According to the Markov algorithm: a matrix of the number proportion of all the vulnerabilities in the direct prediction set at the (m + 1) th time node of each vulnerability in the direct prediction set:

P_m+1＝P’_m·Q’_m

according to the nature of the matrix multiplication, P_m+1Is a row vector containing w elements, P_m+1The ith element in (1) is p_i(m+1)Wherein i is more than or equal to 1 and less than or equal to w;

setting the quantity prediction matrix of various loopholes in the (m + 1) th time node in the direct prediction set as R, and recording R ═ R₁ r₂… r_w]Let us order

Then r is_iThe number predicted value of the ith vulnerability at the (m + 1) th time node is represented, wherein i is more than or equal to 1 and is less than or equal to w;

the matrix R is the number prediction result of various vulnerabilities in the (m + 1) th time node in the direct prediction set S;

(3) predicting the quantity of various vulnerabilities in the indirect prediction set;

1) obtaining cosine similarity matrix

Set indirect prediction set

In v elements, i.e.

All quantity information on m time nodes of the v-type vulnerability is contained; note the book

The number of the ith vulnerability in the jth time node is represented, wherein w + v equals to n, i is more than or equal to 1 and less than or equal to v, and j is more than or equal to 1 and less than or equal to m;

definition of

The variation vector of the ith vulnerability from the tth time node to the t +1 th time node is

Wherein i is more than or equal to 1 and less than or equal to v, and t is more than or equal to 1 and less than m;

defining the variation vector of the jth vulnerability from the tth time node to the t +1 th time node in the S as

Wherein j is more than or equal to 1 and less than or equal to w, and t is more than or equal to 1 and less than m;

wherein the content of the first and second substances,

and

respectively depict the change situation of the i-th vulnerability and the j-th vulnerability between two time nodes, and because the state transition has directionality,

and

is a variation vector;

therefore, from the t-th time node to the t + 1-th time node,

the cosine similarity between the ith bug and the jth bug in S is

Wherein i is more than or equal to 1 and less than or equal to v, j is more than or equal to 1 and less than or equal to w, and t is more than or equal to 1 and less than or equal to m;

is provided with

The cosine similarity between the ith vulnerability and the jth vulnerability in S is cos theta_ijThe value is cos θ_ijtWherein i is more than or equal to 1 and less than or equal to v, j is more than or equal to 1 and less than or equal to w, and t is more than or equal to 1 and less than m, namely:

set indirect prediction set

The cosine similarity matrix with the direct prediction set S is cos theta, then

Wherein i is more than or equal to 1 and less than or equal to v, and j is more than or equal to 1 and less than or equal to w;

2) obtaining most similar vulnerabilities

Finding out the subscript j of the maximum value of the ith row in cos theta, wherein the jth vulnerability in the direct prediction set S is the indirect prediction set

The most similar loopholes of the ith loopholes, wherein i is more than or equal to 1 and less than or equal to v, and j is more than or equal to 1 and less than or equal to w;

executing the above operations, and sequentially taking 1, 2, … … and v as the value of i to obtain an indirect prediction set

Directly predicting the most similar vulnerabilities of various vulnerabilities in the set S;

3) obtaining the predicted value of indirectly predicting the number of various vulnerabilities in the set

Survey indirect prediction set

Directly predicting the most similar vulnerability in the set S, namely the jth vulnerability in the set S, wherein i is more than or equal to 1 and less than or equal to v, and j is more than or equal to 1 and less than or equal to w; from the mth time node to the m +1 th time node, the relative increment of the jth vulnerability is

Then the predicted value of the number of the ith vulnerability at the (m + 1) th time node

Predicting the quantity of various loopholes at the (m + 1) th time node;

setting the quantity prediction matrix of various loopholes in the m +1 time node in the indirect prediction set as

Note the book

Order to

Then

The number predicted value of the ith vulnerability at the (m + 1) th time node is represented, wherein i is more than or equal to 1 and is less than or equal to v;

matrix array

I.e. indirect prediction set

Predicting the number of the various loopholes in the (m + 1) th time node;

(4) obtaining a prediction result of a vulnerability corpus

Order to

Then set R_U＝[R₁ R₂ … R_n]Namely the prediction result of the loophole complete set U at the m +1 time node, R_iAnd the quantity predicted value of the ith type vulnerability in the vulnerability complete set U at the (m + 1) th time node is represented.

Further, the sum of each row in the state transition matrix in step 1) is 1, and the actual state transition matrix obtained by the above algorithm meets this requirement, which is proved as follows:

it is known that:

and (4) proving:

and (3) proving that:

(1) when s is_it≥s_i(t-1)When there is q_iit＝1，q_ijt＝0，j≠i；

Therefore, it is

(2) When s is_it≥s_i(t-1)When there is q_ijt＝f_ij·(1-q_iit)，j≠i；

Obtained from (1) and (2):

further, the sum of each row in the state transition matrix in step 2) is 1, and the predicted state transition matrix obtained by the above algorithm meets this requirement, which is proved as follows:

it is known that:

and (4) proving:

and (3) proving that:

(1) when t is 2, there is Q'_t＝Q_t

So q'_ijt＝q_ijt，1≤i≤w，1≤j≤w；

Therefore, it is

(2) When t is more than 2 and less than or equal to m, there is q'_ijt＝αq_ijt+(1-α)q’_ij(t-1)

According to the mathematical induction method:

1) when t is equal to k, there are

Established

2) When t is equal to k +1,

obtained from 1) and 2):

when t is more than 2 and less than or equal to m, the method is established;

to sum up:

further, the probability matrix in step 3) is a row vector and the sum value is 1, and the actual probability matrix obtained by the above method meets the requirement, which proves as follows:

it is known that:

and (4) proving:

and (3) proving that:

further, the probability matrix in step 4) is a row vector and the sum value is 1, and the prediction probability matrix obtained by the above algorithm meets the requirement, which proves as follows:

it is known that:

and (4) proving:

and (3) proving that:

(1) when t is 1, there is P'_t＝P_t

So p'_it＝p_it，1≤i≤w，1≤j≤w；

Therefore, it is

(2) When t is more than 1 and less than or equal to m, there is p'_it＝αp_it+(1-α)p’_i(t-1)

According to the mathematical induction method:

1) when t is equal to k, there are

Established

2) When t is equal to k +1,

obtained from 1) and 2):

when t is more than 1 and less than or equal to m

To sum up:

further, the number of all kinds of security vulnerabilities in the authoritative information security vulnerability library is inspected and reported at a plurality of time nodes, and a vulnerability complete set is formed and expressed in a two-dimensional matrix form.

Further, the average quantity level of the loophole full set is inspected, the value is taken as the center, the step length is continuously increased, a proper neighborhood interval is determined, the interval is a direct prediction set, and the complement of the direct prediction set to the full set is an indirect prediction set.

The invention has the beneficial effects that:

and taking the security vulnerability as a research object, investigating historical data information of the security vulnerability to form a vulnerability complete set, and reasonably dividing the vulnerability complete set into a direct prediction set and an indirect prediction set. And then, predicting the direct prediction set by using a Markov method improved by an exponential smoothing method, establishing the relation between the direct prediction set and the indirect prediction set by using a cosine similarity principle, and further predicting the indirect prediction set. And finally, integrating the prediction results of the direct prediction set and the indirect prediction set, and providing a high-accuracy prediction value for workers in related fields.

Drawings

Fig. 1 is an algorithm diagram for dividing a vulnerability complete set into a direct prediction set and an indirect prediction set.

FIG. 2 is a graphical representation of a Markov prediction security vulnerability quantity method based on smoothing and similarity improvement.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

A Markov prediction vulnerability number method based on a smoothing method and similarity is shown in figures 1 and 2, and is characterized by comprising the following steps:

u_ijNumber of j time nodes representing i type vulnerabilityQuantity, wherein i is more than or equal to 1 and less than or equal to n, j is more than or equal to 1 and less than or equal to m, the average number level of the vulnerability complete set is considered, namely the arithmetic mean of all data in U is calculated

And rounding the calculated arithmetic mean down:

wherein i is more than or equal to 1 and less than or equal to n,

to be provided with

Inputting: vulnerability complete set U and parameter p

And (3) outputting: direct prediction set S and indirect prediction set

Order to

sum＝0，

1) obtaining an actual state transition matrix

Inputting: direct prediction set S, parameter f_ij，

And (3) outputting: q. q.s_it

2) obtaining a predicted state transition matrix

determining a predicted state transition matrix Q 'by'_t：

When t is 2, Q'_t＝Q_t；

q’_ijt＝αq_ijt+(1-α)q’_ij(t-1)

wherein alpha is more than 0 and less than 1;

3) obtaining an actual probability matrix

Let P_tRepresenting the actual probability matrix at the t-th time node; note P_t＝[p_1t p_2t … p_wt]，p_itThe ratio of the ith vulnerability number in the direct prediction set to all vulnerability numbers in the direct prediction set at the tth time node is represented, and the count is recorded

4) obtaining a prediction probability matrix

determining a prediction probability matrix by:

a is P 'when t is 1'_t＝P_t；

Wherein i is more than or equal to 1 and less than or equal to m;

(c 'when i is 1)'_i＝c_i；

wherein alpha is more than 0 and less than 1,

P_m+1＝P’_m·Q’_m

according to the nature of the matrix multiplication, P_m+1Is aA row vector comprising w elements, denoted P_m+1The ith element in (1) is p_i(m+1)Wherein i is more than or equal to 1 and less than or equal to w;

1) obtaining cosine similarity matrix

Set indirect prediction set

In v elements, i.e.

definition of

defining class j leaks in SThe change vector of the hole from the t-th time node to the t + 1-th time node is

wherein the content of the first and second substances,

and

and

is a variation vector;

therefore, from the t-th time node to the t + 1-th time node,

the cosine similarity between the ith bug and the jth bug in S is

is provided with

set indirect prediction set

2) obtaining most similar vulnerabilities

Survey indirect prediction set

Predicting the quantity of various loopholes at the (m + 1) th time node;

Note the book

Order to

Then

matrix array

I.e. indirect prediction set

Predicting the number of the various loopholes in the (m + 1) th time node;

(4) obtaining a prediction result of a vulnerability corpus

Order to

it is known that:

and (4) proving:

and (3) proving that:

(3) when s is_it≥s_i(t-1)When there is q_iit＝1，q_ijt＝0，j≠i；

Therefore, it is

(4) When s is_it≥s_i(t-1)When there is q_ijt＝f_ij·(1-q_iit)，j≠i；

Obtained from (1) and (2):

it is known that:

and (4) proving:

and (3) proving that:

(1) when t is 2, there is Q'_t＝Q_t

So q'_ijt＝q_ijt，1≤i≤w，1≤j≤w；

Therefore, it is

According to the mathematical induction method:

3) when t is equal to k, there are

Established

4) When t is equal to k +1,

obtained from 1) and 2):

when t is more than 2 and less than or equal to m, the method is established;

to sum up:

it is known that:

and (4) proving:

and (3) proving that:

it is known that:

and (4) proving:

and (3) proving that:

(1) when t is 1, there is P'_t＝P_t

So p is_it＝p_it，1≤i≤w，1≤j≤w；

Therefore, it is

According to the mathematical induction method:

1) when t is equal to k, there are

Established

2) When t is equal to k +1,

obtained from 1) and 2):

when t is more than 1 and less than or equal to m

To sum up:

Claims

1. A Markov prediction vulnerability quantity method based on a smoothing method and similarity is characterized by comprising the following steps:

And rounding the calculated arithmetic mean down:

wherein i is more than or equal to 1 and less than or equal to n,

to be provided with

For reference, 1 is the initial step size, p is for determining the range of the direct prediction setThe parameters are generally more than 0 and less than 1, in order to ensure the accuracy of the prediction result,

Inputting: vulnerability complete set U and parameter p

And (3) outputting: direct prediction set S and indirect prediction set

Order to

sum＝0,

1) obtaining an actual state transition matrix

si_jThe number of the ith vulnerability in the jth time node is represented, wherein i is more than or equal to 1 and less than or equal to w, j is more than or equal to 1 and less than or equal to m,

let Q_tRepresenting the actual state transition from the t-1 st time node to the t-th time node

q_ijtIndicating that the ith vulnerability is transferred to the jth vulnerability from the t-1 th time node to the tth time nodeThe actual probability of t is more than 1 and less than or equal to m, i is more than or equal to 1 and less than or equal to w, and j is more than or equal to 1 and less than or equal to w;

q_itRepresents Q_tWherein i is not less than 1 and not more than w;

Inputting: direct prediction set S, parameter f_ij，

And (3) outputting: q. q.s_it

2) obtaining a predicted state transition matrix

Is Q'_tRepresenting the predicted state transition matrix from the t-1 st time node to the t-th time node

q′_ijtThe prediction probability of the ith vulnerability transferred from the t-1 th time node to the jth vulnerability is represented, wherein t is more than 1 and less than or equal to m, i is more than or equal to 1 and less than or equal to w, and 1 is more than or equal to 1j≤w；

Determining a predicted state transition matrix Q 'by'_t：

When t is 2, Q'_t＝Q_t；

B, when t is more than 2 and less than or equal to m, Q'_tElement q 'of (1)'_ijtWherein i is more than or equal to 1 and less than or equal to w, j is more than or equal to 1 and less than or equal to w is obtained by an exponential smoothing method, namely:

q′_ijt＝αq_ijt+(1-α)q′_ij(t-1)

wherein alpha is more than 0 and less than 1;

3) obtaining an actual probability matrix

Let P_tRepresenting the actual probability matrix at the t-th time node; note P_t＝[p_1t p_2t…p_wt]，p_itThe ratio of the ith vulnerability number in the direct prediction set to all vulnerability numbers in the direct prediction set at the tth time node is represented, and the count is recorded

4) obtaining a prediction probability matrix

p′_itThe predicted value of the ratio of the ith vulnerability in the direct prediction set to all vulnerabilities in the direct prediction set at the tth time node is represented, wherein i is more than or equal to 1 and less than or equal to w, t is more than or equal to 1 and less than or equal to m,

determining a prediction probability matrix by:

a is P 'when t is 1'_t＝P_t；

B, when t is more than 1 and less than or equal to m, P'_tOf (1) element p'_itWherein i is more than or equal to 1 and less than or equal to w, and the value is obtained by an exponential smoothing method, namely: p'_it＝αp_it+(1-α)p′_i(t-1)Wherein alpha is more than 0 and less than 1,

Setting the actual value of the total number of the loopholes of the direct prediction set at each time node as C, and recording C as [ C ═ C%₁ c₂…c_m c_m+1]，c_iRepresenting the total number of holes of the ith time node, i.e.

Wherein i is more than or equal to 1 and less than or equal to m;

if the predicted value of the total number of vulnerabilities of the direct prediction set at each time node is C ', C ═ C'₁ c′₂…c′_m c′_m+1]C 'is determined by'_iWherein i is more than or equal to 1 and less than or equal to m:

(c 'when i is 1)'_i＝c_i；

C 'when i is more than 1 and less than or equal to m + 1'_iObtained by exponential smoothing, i.e.：

Wherein alpha is more than 0 and less than 1,

P_m+1＝P′_m·Q′_m

setting the quantity prediction matrix of various loopholes in the (m + 1) th time node in the direct prediction set as R, and recording R ═ R₁ r₂…r_w]Let us order

1) obtaining cosine similarity matrix

Set indirect prediction set

In v elements, i.e.

All quantity information on m time nodes of the v-type vulnerability is contained;note the book

definition of

wherein the content of the first and second substances,

and

and

is a variation vector;

therefore, from the t-th time node to the t + 1-th time node,

the cosine similarity between the ith bug and the jth bug in S is

is provided with

set indirect prediction set

2) obtaining most similar vulnerabilities

In the direct prediction set SThe most similar vulnerability of (1);

Survey indirect prediction set

Predicting the quantity of various loopholes at the (m + 1) th time node;

Note the book

Order to

Then

matrix array

I.e. indirect prediction set

Predicting the number of the various loopholes in the (m + 1) th time node;

(4) obtaining a prediction result of a vulnerability corpus

Order to

Then set R_U＝[R₁ R₂…R_n]Namely the prediction result of the loophole complete set U at the m +1 time node, R_iAnd the quantity predicted value of the ith type vulnerability in the vulnerability complete set U at the (m + 1) th time node is represented.

2. The method according to claim 1, wherein the sum of each row in the state transition matrix in step 1) is 1, and the actual state transition matrix obtained by the method for predicting the number of the holes based on the markov algorithm and the similarity satisfies the requirement as follows:

it is known that:

and (4) proving:

and (3) proving that:

(1) when s is_it≥s_i(t-1)When q is greater_iit＝1，q_ijt＝0，j≠i；

Therefore, it is

(2) When s is_it≥s_i(t-1)When q is greater_ijt＝f_ij·(1-q_iit)，j≠i；

Therefore, it is

Obtained from (1) and (2):

3. the method for predicting the number of the holes based on the markov algorithm with the similarity as claimed in claim 1, wherein the sum of each row in the state transition matrix in the step 2) is 1, and the predicted state transition matrix obtained by the method for predicting the number of the holes based on the markov algorithm with the similarity meets the requirement, which is proved as follows:

it is known that:

and (4) proving:

and (3) proving that:

(1) when t is 2, there is Q'_t＝Q_t

So q'_ijt＝q_ijt，1≤i≤w，1≤j≤w；

Therefore, it is

(2) When t is more than 2 and less than or equal to m, there is q'_ijt＝αq_ijt+(1-α)q′_ij(t-1)

According to the mathematical induction method:

1) when t is equal to k, there are

Established

2) When t is equal to k +1,

obtained from 1) and 2):

when t is more than 2 and less than or equal to m, the method is established;

to sum up:

4. the method according to claim 1, wherein the probability matrix in step 3) is a row vector and the sum is 1, and the actual probability matrix obtained by the method for predicting the number of the holes based on the markov algorithm and the similarity satisfies the requirement as follows:

it is known that:

and (4) proving:

and (3) proving that:

5. the method for predicting the number of the holes based on the Markov algorithm with the smoothness and the similarity as claimed in claim 1, wherein the probability matrix in the step 4) is a row vector and the sum is 1, and the prediction probability matrix obtained by the method for predicting the number of the holes based on the Markov algorithm with the similarity meets the requirement, which is proved as follows:

it is known that:

and (4) proving:

and (3) proving that:

(1) when t is 1, there is P'_t＝P_t

So p'_it＝p_it，1≤i≤w，1≤j≤w；

Therefore, it is

(2) When t is more than 1 and less than or equal to m, there is p'_it＝αp_it+(1-α)p′_i(t-1)

According to the mathematical induction method:

1) when t is equal to k, there are

Established

2) When t is equal to k +1,

obtained from 1) and 2):

when t is more than 1 and less than or equal to m

To sum up:

6. the method for predicting the number of the vulnerabilities based on the markov algorithm with the similarity as claimed in claim 1, wherein the number of the various types of vulnerabilities in the authority information security vulnerability database at a plurality of time nodes is examined to form a vulnerability complete set, which is expressed in a two-dimensional matrix form.

7. The method of claim 1, wherein an average number level of a vulnerability corpus is examined, the step length is continuously increased by taking the average number as a center, a proper neighborhood interval is determined, the interval is a direct prediction set, and a complement of the direct prediction set to the corpus is an indirect prediction set.