CN116306909A

CN116306909A - Method for realizing model training, computer storage medium and terminal

Info

Publication number: CN116306909A
Application number: CN202310217064.6A
Authority: CN
Inventors: 董颖; 卞超轶
Original assignee: BEIJING LEADSEC TECHNOLOGY CO LTD; Beijing Venustech Cybervision Co ltd; Venustech Group Inc
Current assignee: BEIJING LEADSEC TECHNOLOGY CO LTD; Beijing Venustech Cybervision Co ltd; Venustech Group Inc
Priority date: 2023-03-02
Filing date: 2023-03-02
Publication date: 2023-06-23

Abstract

Disclosed herein are a method, a computer storage medium, and a terminal for implementing model training, including: combining first data selected from the marked data set and second data selected from the unmarked data set into training data; training the first deep learning model obtained through previous training according to a preset loss function through training data to obtain a second deep learning model; wherein the marking the dataset comprises: data marking entities and/or relationships between entities in the vulnerability report; the unlabeled dataset includes: data that does not label entities and/or relationships between entities in the vulnerability containment; the loss function is determined based on the cross entropy determined by the first data and the second data. According to the embodiment of the invention, training data is formed based on the first data and the second data, the training of the first deep learning model is executed by using the loss function determined by the first data and the second data, and the data deviation generated when the vulnerability report is processed by the second deep learning model is avoided.

Description

Method for realizing model training, computer storage medium and terminal

Technical Field

The present invention relates to, but is not limited to, network security technologies, and in particular, to a method, a computer storage medium, and a terminal for implementing model training.

Background

In recent years, network security situation is still severe, high-risk security vulnerabilities are frequent, and in order to minimize security risks brought by vulnerabilities, security administrators of government enterprises generally need to learn about the latest vulnerability information of the concerned software and threat information related to the latest vulnerability information by searching some authoritative public vulnerability data sources and some third-party unstructured vulnerability data sources.

Because modern software often relies on many components, a list of all components on which the software directly or indirectly depends is acquired for each concerned software, and whether the software and all components on which the software depends have vulnerabilities or not is searched periodically, but the manual search cost is high, important vulnerability information is easy to miss, hysteresis exists, and if relevant vulnerability information is not searched for the first time after the vulnerability is revealed, the optimal emergency response time is missed, and incomparable losses can be caused. Therefore, in order to extract the most accurate and comprehensive vulnerability information at the first time, it is necessary to extract vulnerability information such as dependency relationships between affected software and components and affected versions from the disclosed vulnerability data sources by means of an automated vulnerability information extraction technology.

The research of unstructured vulnerability information extraction in the related art is often based on a deep learning model, however, the research still faces the following problems: over time, the information extraction effect of the deep learning model on the new vulnerability report may gradually degrade, and a data migration problem is generated, if the model is kept to obtain an excellent information extraction effect on the new vulnerability report, a large number of new vulnerability reports need to be marked continuously to retrain the model, however, the retrain model needs a large amount of data with huge workload, and the data migration problem becomes a problem to be solved.

Disclosure of Invention

The following is a summary of the subject matter described in detail herein. This summary is not intended to limit the scope of the claims.

The embodiment of the invention provides a method for realizing model training, a computer storage medium and a terminal, which can solve the problem of data migration.

The embodiment of the invention provides a method for realizing model training, which comprises the following steps:

combining first data selected from the marked data set and second data selected from the unmarked data set into training data for model training;

training the first deep learning model obtained by the previous training according to the preset loss function through the composed training data to obtain a second deep learning model;

wherein the marker dataset comprises: more than one piece of data marking entities and/or relationships between entities in the vulnerability report; the unlabeled dataset includes: more than one piece of data that does not label entities and/or relationships between entities in the vulnerability inclusion; the loss function is determined based on cross entropy determined by the first data and the second data.

In another aspect, an embodiment of the present invention further provides a computer storage medium, where a computer program is stored, where the computer program is executed by a processor to implement the method for implementing model training described above.

In still another aspect, an embodiment of the present invention further provides a terminal, including: a memory and a processor, the memory storing a computer program; wherein,,

the processor is configured to execute the computer program in the memory;

the computer program, when executed by the processor, implements a method of implementing model training as described above.

The technical scheme of the application comprises the following steps: combining first data selected from the marked data set and second data selected from the unmarked data set into training data for model training; training the first deep learning model obtained by the previous training according to the preset loss function through the composed training data to obtain a second deep learning model; wherein the marker dataset comprises: more than one piece of data marking entities and/or relationships between entities in the vulnerability report; the unlabeled dataset includes: more than one piece of data that does not label entities and/or relationships between entities in the vulnerability inclusion; the loss function is determined based on the cross entropy determined by the first data and the second data. According to the embodiment of the invention, training data is formed based on the first data and the second data, and training of the first deep learning model is executed by using the loss function determined by the first data and the second data, so that data deviation is avoided when the vulnerability report is processed by the second deep learning model.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate and do not limit the invention.

FIG. 1 is a flow chart of a method of implementing model training in accordance with an embodiment of the present invention;

FIG. 2 is a schematic diagram illustrating an embodiment of the present invention for extracting vulnerability information from a vulnerability report.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail hereinafter with reference to the accompanying drawings. It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be arbitrarily combined with each other.

The steps illustrated in the flowchart of the figures may be performed in a computer system, such as a set of computer-executable instructions. Also, while a logical order is depicted in the flowchart, in some cases, the steps depicted or described may be performed in a different order than presented herein.

FIG. 1 is a flowchart of a method for implementing model training according to an embodiment of the present invention, as shown in FIG. 1, including:

step 101, forming training data for model training by the first data selected from the marked data set and the second data selected from the unmarked data set;

step 102, training a first deep learning model obtained by previous training according to a preset loss function through composed training data to obtain a second deep learning model;

wherein the marking the dataset comprises: more than one piece of data marking entities and/or relationships between entities in the vulnerability report; the unlabeled dataset includes: more than one piece of data that does not label entities and/or relationships between entities in the vulnerability inclusion; the loss function is determined based on the cross entropy determined by the first data and the second data.

According to the embodiment of the invention, training data is formed based on the first data and the second data, and training of the first deep learning model is executed by using the loss function determined by the first data and the second data, so that data deviation is avoided when the vulnerability report is processed by the second deep learning model.

In an exemplary embodiment, the first deep learning model in the embodiment of the present invention is a deep learning model for identifying target information, where the target information includes: information of entities and/or information of relationships between entities.

It should be noted that, in the embodiment of the present invention, the first deep learning model and the second deep learning model are only relatively speaking, and the second deep learning model becomes the first deep learning model obtained by the previous training once the training is completed. In the embodiment of the present invention, the training in steps 101 to 102 is a complete training including several rounds, for example: training was performed once a day, each training consisting of 20 rounds.

In an exemplary embodiment, before training the first deep learning model obtained by previous training through the composed training data according to a preset loss function, the method of the embodiment of the present invention further includes:

and training to obtain an initial first deep learning model according to more than one non-empty third data in the marked data set.

In an exemplary embodiment, the first data in the embodiment of the present invention includes: more than one data in the data set that is not used for training is marked.

In one illustrative example, the second data in the embodiment of the present invention includes: more than one set of data in the unlabeled dataset that has been used for training and more than one set of data that has not been used for training.

In an exemplary embodiment, when forming training data for model training, the method of the embodiment of the present invention further includes:

more than one piece of data in the marker dataset that has been used to train the first deep learning model is added to the training data as second data.

In one illustrative example, the loss function in embodiments of the present invention is a function determined by:

calculating a predicted result of the second deep learning model (the second deep learning model) on the first data when the second deep learning model is trained for the ith time, and a first cross entropy of the real label of the first data;

calculating a second cross entropy of the class probability distribution output by the second deep learning model (the second deep learning model) on the second data when the second deep learning model is trained for the ith time and the class probability distribution output by the first deep learning model obtained in the previous (i-1 th) training on the second data;

and summing the calculated first cross entropy and the calculated second cross entropy according to a preset weighting coefficient to obtain a loss function.

In an exemplary embodiment, when calculating the second cross entropy, the method according to the embodiment of the present invention further includes:

and normalizing the output vector of the last connecting layer of the second deep learning model trained for the ith time by using a normalized exponential function with the temperature coefficient larger than 1.

In an exemplary embodiment, before composing the training data for model training, the method of the embodiment of the present invention further includes:

detecting data offset according to a preset time window;

when the occurrence of data offset is detected, the latest obtained data is marked, and the marked data is added to a marked data set to be used as first data in training data for training a first deep learning model obtained by previous training at the later time.

In an exemplary embodiment, the embodiment of the present invention performs detection of data offset according to a preset time window, including performing detection of data offset based on the following observation values of the monitoring index:

a distribution of prediction confidence and/or a feature distribution of text information extracted from the vulnerability report.

In an exemplary embodiment, when the monitoring index in the embodiment of the present invention includes a distribution of prediction confidence, the distribution of prediction confidence includes an index of one or any combination of the following:

statistical indicators of prediction confidence of entity categories;

statistical indicators of predictive confidence of relationships between entities;

the predicted category is a sample of a preset category, and the proportion of the sample in the data contained in all unlabeled data sets is calculated;

statistical indicators of predictive confidence of relationship categories between entities;

wherein the statistical indicator of the predicted confidence of the entity class comprises a statistical value of one or any combination of the following of the predicted confidence of the entity: mean, median, standard deviation, maximum and minimum; here, the entity includes one or any combination of the following: affected software, affected components, affected versions, and affected operating systems; the statistical indicator of the predictive confidence of the relationship between entities includes a statistical value of one or any combination of the following of the predictive confidence of the relationship of the entity time: mean, median, standard deviation, maximum and minimum; here, the relationship between entities includes one or any combination of the following: is related, dependent and operative; the predicted class is a sample of a preset class, and the duty ratio of the sample in the data contained in all unlabeled data sets comprises: more than one entity, the duty cycle of the entities contained in all unlabeled data sets; here, the entity includes one or any combination of the following: affected software, affected components, affected versions, and affected operating systems; a statistical indicator of a predictive confidence of a relationship category between entities, comprising: a relationship between more than one entity, a ratio of the relationship between the entities contained in all unlabeled data sets; here, the relationship between entities includes one or any combination of the following: is related, dependent and operative.

In an exemplary embodiment, when the monitoring index in the embodiment of the present invention includes a feature distribution of text information extracted from a vulnerability report, the feature distribution of text information extracted from the vulnerability report includes an index of one or any combination of the following:

the statistical index of the text information of the extracted entity category;

the extracted entities of the preset category which accord with the preset characteristics have the duty ratio in all the extracted entities of the category;

the extracted relation of the preset category which accords with the preset characteristics, and the proportion of the relation of all the extracted categories;

the statistical indexes of the extracted text information of the entity category comprise: an index of one or any combination of the following of the extracted entities: number of digital characters, character length, upper case letter duty ratio, special character count and the like; here, the entity includes one or any combination of the following: affected software, affected components, affected versions, and affected operating systems; the extracted entity of the preset category meeting the preset characteristics has the following duty ratio in all the extracted entities of the category: for each of the more than one entity, extracting the duty ratio of the entity existing in a Common Platform Enumeration (CPE) dictionary in all the extracted entities; here, the entity includes one or any combination of the following: affected software, affected components, affected versions, and affected operating systems; the ratio of the extracted relation of the preset category which accords with the preset characteristic to the extracted relation of all the categories comprises the following steps: the extracted entity pairs which exist in the CPE dictionary and accord with the relation between the entities, and the duty ratio of the extracted entity pairs which accord with the relation between the entities; here, the entity pair includes: the relationship-dependent < affected software, the affected version >, the relationship-dependent < affected software, the affected operating system > entity pairs, the relationship-dependent < affected software, the affected component > entity pairs.

The embodiment of the invention also provides a computer storage medium, wherein a computer program is stored in the computer storage medium, and the method for realizing model training is realized when the computer program is executed by a processor.

The embodiment of the invention also provides a terminal, which comprises: a memory and a processor, the memory storing a computer program; wherein,,

the processor is configured to execute the computer program in the memory;

the computer program, when executed by a processor, implements a method for implementing model training as described above.

The following briefly describes embodiments of the present invention by way of application examples, which are merely used to set forth embodiments of the present invention and are not intended to limit the scope of the present application.

Application example

The information required to be extracted from the vulnerability report by the deep learning model in the application example comprises the names of the affected software and components and the affected versions thereof, the dependency relationship between the affected software and the components and between the affected components, and the like, and the relation between the entities required to be extracted from the vulnerability report is shown in a table 1; the entity to be extracted in the application example comprises: affected software, affected components, affected operating systems, affected versions; in one illustrative example, the present application example may use entity recognition techniques to extract these entities, by which it is necessary to analyze and model their context, to distinguish between affected and unaffected entities.

The application example needs to extract the relation among the entities, comprising: correlation, dependence and operation, etc.; wherein the correlation is used to indicate whether the affected software, components, and operating systems are related to the affected versions, respectively. This relationship exists in 3 sets of entity pairs, namely, the affected software, the affected components, the affected operating system and the affected versions associated therewith need to be paired separately, see lines 1-3 in table 1. The vulnerability descriptions often contain entities such as software, components, operating systems and the like at the same time, and the entities and related versions thereof are likely to be far apart in sentences, so that the difficulty of accurate pairing is high; depending on whether the software used to represent the affected depends on the affected component; this relationship exists only in 1 set of entity pairs, see line 4 in table 1; run in a system for indicating whether a given affected software is running in a given affected operating system; this relationship exists only in 1 set of entity pairs, see the last row in table 1.

FIG. 2 illustrates an example of extracting vulnerability information from a vulnerability report, the vulnerability information including entities and relationships between entities, where "- - - -" represents an affected component, "… …" represents an affected operating system, solid lines without arrows represent affected software, wavy lines represent affected versions, lines with arrows composed of points represent relationships between entities as related, dashed lines with arrows represent relationships between entities as running, and implementations with arrows represent relationships between entities as dependent; referring to FIG. 2, the affected software extracted from the vulnerability descriptions are NET Framework and Skype for Business, the affected components are Windows font library, the affected operating systems are Windows Vista and Windows Server, and the affected versions are SP2, 2008 SP2, 3.0SP2to 4.6 and 2016. In FIG. 2, the affected software and operating system and the corresponding affected versions with the correlation are < Windows Vista, SP2>, < Windows Server,2008 SP2>, <. NET frame, 3.0SP2to 4.6> and < Skype for Business,2016>, respectively; the affected software and affected components that have dependencies include <. NET Framework Windows font library > and < Skype for Business, windows font library >; there are affected software running in the relationship and the affected operating systems are < Skype for Business, windows Vista >, < Skype for Business, windows Server >, <. NET frame work, windows Vista >, <. NET frame work, windows Server >.

ID	Head entity	Tail entity	Entity relationship
				1	Affected software	Affected byVersion of	Correlation of
2	Affected component	Affected versions	Correlation of
				3	Affected operating system	Affected versions	Correlation of
4	Affected software	Affected component	Depending on
				5	Affected software	Affected operating system	Run in

TABLE 1

The method for extracting the vulnerability information from the vulnerability report by the application example can comprise a pipeline method and a joint extraction method in the related art.

In an illustrative example, the present application example acquires the marker data set { L } with reference to the related art ₀ ,L ₁ ,L ₂ ,…,L _n Sum of unlabeled data sets { U } ₀ ,U ₁ ,U ₂ ,…,U _n -a }; wherein L is ₀ Not empty, L _a (0<a.ltoreq.n) may be empty; l (L) _a Representing a piece of data in the marked dataset, L _a Not empty, include real in report on vulnerabilityData that marks relationships between volumes and/or entities; u (U) _b (0.ltoreq.b.ltoreq.n) are all non-empty sets, U _b The entities and/or relationships between entities in the vulnerability report are not labeled;

training to obtain an initial first deep learning model for target information identification by marking third data in the data set; the third data is any part of data which is not empty in the marked data set; the target information includes: entities and/or relationships between entities;

according to a preset loss function, training a first deep learning model obtained by previous training according to a preset training period by the application example through composed training data to obtain a second deep learning model;

wherein the loss function is determined based on cross entropy determined by the first data and the second data.

In one illustrative example, the first data in the present application example includes: marking one part of data in the data set which is not used for training; the second data includes: all data in the unlabeled dataset that has been used for training and one piece of data that has not been used for training.

In an illustrative example, the present application example method further comprises:

more than one piece of first data in the marked data set, which has been used for training, is added to the training data as second data.

The present application example learns the model M at a first depth ₀ On the basis of the model training method, the model is trained according to a preset training period. Assuming that both the labeled dataset and the unlabeled dataset are labeled in the order of precedence for model training, then the ith time (0<i.ltoreq.n) training, based on the previously obtained first deep learning model M _i-1 Training the obtained new second deep learning model M _i The training process comprises the following steps: preparing the marking data D used for the training _L And unlabeled data D _U ，D _L I.e. the most recently marked data, D _L ＝L _i ；D _U For the union of all unlabeled data, i.e. D _U Equal to U ₀ 、U ₁ 、U ₂ ,…,U _i Is a union of (a) and (b). In an illustrative example, the present application example method further comprises: l to be used for training ₀ 、L ₁ 、L ₂ ,…,L _i-1 Are added as training data for the ith training.

In one illustrative example, the loss function in this application example is a function determined by:

In one illustrative example, the present application example loss function includes two parts:

the first part is the marking data D _L Loss of _L I.e. second deep learning model M in training _i In the marked dataset D _L Prediction result on the table

And D _L Is->

The first cross entropy between the two is expressed as follows:

the second part being unlabeledData D _U Loss of distillation Loss of pass _U I.e. second deep learning model M in training _i And a first deep learning model M obtained by previous training _i-1 In untagged data D _U The second cross entropy of the class probability distribution output above has the expression:

the loss function is a weighted sum of the first cross entropy and the second cross entropy, and is calculated by the super-parameter alpha (0<α<1) To balance knowledge learned from the marker data with the model M _i-1 Knowledge learned above, expressed as: loss=α×loss _L +(1-α)*Loss _U 。

The present application example may model M _i-1 Consider as a teacher model, model M _i And is considered as a student model. Loss (Low Density) _U For ensuring that the outputs of the student model and the teacher model are as consistent as possible, loss _L The classification result of the student model is ensured to be consistent with the real label of the marking data as much as possible. In the distillation loss, if the student model is allowed to learn the class output result of the teacher model, the student model is allowed to inherit the error of the teacher model and amplify the error of the teacher model, and therefore, the application example allows the student model to learn the class probability distribution of the teacher model, instead of learning the class output result of the teacher model. Compared with the class output result, the class distribution probability contains more implicit knowledge, so that uncertainty of a teacher model on some outputs is reserved, and the student model is helped to learn how the teacher model is generalized; in addition, the gradient variance of the training stage is smaller, the training process is more stable, the required training data is less, a larger learning rate can be used, and the model is smaller when learning on the class probability distribution. According to the application example, each time training is performed, the teacher model continuously transmits learned knowledge of old data to the student model through distillation loss, and continuous learning of the old knowledge is achieved; if the new knowledge is not empty, the student model learns the new knowledge and the old knowledge at the same time; the student model trained each time is used as a teacher model trained next time to realize knowledge transfer and realityContinuous learning based on knowledge distillation is now presented.

In an illustrative example, the present application example method further comprises: and when the second cross entropy is calculated, normalizing the output vector of the last connecting layer of the second deep learning model by using a normalized exponential function with the temperature coefficient larger than 1.

This application example calculates Loss of distillation Loss _U When using the temperature coefficient T>Normalized exponential function (softmax) of 1 normalizes the output vector of the last connected layer of the second deep learning model, and each value z in the vector _i Conversion to the corresponding p _i ：

The larger the T value in the application example, the smoother the generated class probability distribution. And the probability distribution of the teacher model is distilled out through the temperature coefficient T, and the student model is trained by using the probability distribution, so that the generalization capability of the student model can be further improved, and the overfitting is reduced.

In the application example, the deep learning model is assumed to be trained once a day, L _b And U _b (0.ltoreq.b.ltoreq.n) may be regarded as marked data and unmarked data obtained on day b; since the tag data is difficult to acquire, only the tag data L of the first day is currently required ₀ Non-empty for training an initial first deep learning model M ₀ The method comprises the steps of carrying out a first treatment on the surface of the On each subsequent day, if no marking data is obtained, the marking data L _a (0<a.ltoreq.n) are all empty. If the data migration phenomenon does not occur, the deep learning model obtained by the previous training can still obtain the same effect as the old data on the new data; if the data migration occurs, the model effect gradually deteriorates. In order to further reduce the influence of data offset on the model, the application example executes data offset detection according to a preset time window, if the data offset occurs, marks new data acquired on the day, and adds a marked data set to be used as training data for training a second deep learning model;if no data offset occurs, no new data need to be marked.

The vulnerability report information extraction processing based on the deep learning model is carried out on the multivariate time series data, and the related technology is aimed at a deviation detection scheme of the multivariate time series data, and a large amount of data still needs to be marked before each detection of data deviation. The application example provides an offset detection method, wherein the probability of data offset is calculated automatically along with the time, if the probability exceeds a preset threshold, an alarm is sent out, and after the alarm is received, further manual inspection is performed, and data marking is performed according to the situation. The application example does not need to mark data, can identify the situation that data deviation does not occur, and further reduces unnecessary manual marking workload.

This application example offset detection includes: obtaining an observation value of a monitoring index; and performing offset detection according to the observed value of the monitoring index.

In one illustrative example, the monitoring metrics in this application example include: predicting distribution of confidence and/or feature distribution of text information extracted from the vulnerability report;

when the monitoring indicator comprises a distribution of predictive confidence, the distribution of predictive confidence comprises an indicator of one or any combination of the following:

statistical indicators of prediction confidence of entity categories;

wherein the statistical indicator of the predicted confidence of the entity class comprises a statistical value of one or any combination of the following of the predicted confidence of the entity: mean, median, standard deviation, maximum and minimum; the statistical indicator of the predictive confidence of the relationship between entities includes a statistical value of one or any combination of the following of the predictive confidence of the relationship of the entity time: mean, median, standard deviation, maximum and minimum; the predicted class is a sample of a preset class, and the duty ratio of the sample in the data contained in all unlabeled data sets comprises: more than one entity, the duty cycle of the entities contained in all unlabeled data sets; a statistical indicator of a predictive confidence of a relationship category between entities, comprising: a relationship between more than one entity, a ratio of the relationship between the entities contained in all unlabeled data sets.

In an illustrative example, when the monitoring metrics in the present application example include a feature distribution of text information extracted from a vulnerability report, the feature distribution of text information extracted from the vulnerability report includes metrics of one or any combination of:

the statistical index of the text information of the extracted entity category;

the statistical indexes of the extracted text information of the entity category comprise: an index of one or any combination of the following of the extracted entities: number of numeric characters, character length, upper case letter duty ratio and special character count; the extracted entities of the preset category which accord with the preset characteristics have the following proportion in all the extracted entities of the category: enumerating, for each of the more than one entity, the extracted duty ratios of the entities present in the CPE dictionary for the generic platform among all the extracted entities; the extracted relationships of the preset categories which accord with the preset characteristics, and the duty ratio in the extracted relationships of all the categories comprise: the extracted entity pairs which exist in the CPE dictionary and accord with the relation between the entities, and the duty ratio of the extracted entity pairs which accord with the relation between the entities.

And selecting one or more preset monitoring indexes from the two types of monitoring indexes, and acquiring related data at intervals of a preset time window and summarizing to obtain the observation value of the monitoring index in the process of regularly predicting unlabeled data by the model.

The application example can respectively carry out offset detection aiming at single monitoring indexes, and then carries out weighted summarization on offset detection results of all preset indexes; alternatively, the offset detection may be performed directly based on a plurality of indices. When the probability of detecting the offset is large, an alarm is issued. The method for detecting the deviation of the single index mainly comprises an outlier detection algorithm and an anomaly detection algorithm based on time sequence modeling, such as differential integration moving average autoregressive model (ARIMA, autoregressive Integrated Moving Average model), wherein the historical time sequence data, namely a historical observation sequence of the monitoring index, is modeled to predict the observation value, namely a predicted value, of the monitoring index at the current moment, and finally, whether the deviation occurs is judged by acquiring the deviation between the predicted value and the actual observation value of the monitoring index. Offset detection for multiple monitoring indicators refers to comparing the latest observations of a set of monitoring indicators with the historical observations of the set of monitoring indicators, such as a two-sample hypothesis Test to determine whether the differences between the two sets of data are statistically significant, the underlying two-sample hypothesis Test being a KS Test (Kolmogorov-Smirnov Test), which is a non-parametric statistical Test, applicable to any distribution.

According to the application example, old knowledge is learned on an old deep learning model through a continuous learning strategy, so that the same excellent effect as before is ensured to be obtained on the old knowledge, and compared with the old knowledge is learned on old data, the obtained model is smaller, a larger learning rate can be used, the training time is saved, and the training cost is reduced; when the data offset is detected on the new data, the model learns the old knowledge from the old model and also learns the new knowledge on the new data, so that the model can simultaneously perform good on the new and old data, the old knowledge is not forgotten, and the new knowledge can be mastered.

Those of ordinary skill in the art will appreciate that all or some of the steps, systems, functional modules/units in the apparatus, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between the functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed cooperatively by several physical components. Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.

Claims

1. A method of implementing model training, comprising:

2. The method of claim 1, wherein before training the first deep learning model obtained from the previous training with the composed training data according to a predetermined loss function, the method further comprises:

training to obtain an initial first deep learning model according to more than one non-empty third data in the marked data set.

3. The method of claim 1, wherein the first data comprises: more than one data of the set of labeled data that is not used for trained data.

4. The method of claim 1, wherein the second data comprises: more than one set of data in the unlabeled dataset that has been used for training and more than one set of data that has not been used for training.

5. The method of claim 4, wherein when the training data for model training is composed, the method further comprises:

6. The method of any of claims 1-5, wherein the loss function comprises a function determined by:

calculating a predicted result of the second deep learning model on the first data when the second deep learning model is trained for the ith time, and a first cross entropy of the predicted result and a real label of the first data;

calculating a second cross entropy of the class probability distribution output by the second deep learning model on the second data when the second deep learning model is trained for the ith time and the class probability distribution output by the first deep learning model on the second data, which is obtained in the previous training;

and summing the calculated first cross entropy and the calculated second cross entropy according to a preset weighting coefficient to obtain the loss function.

7. The method of claim 6, wherein when calculating the second cross entropy, the method further comprises:

and carrying out normalization processing on the output vector of the last connecting layer of the second deep learning model trained for the ith time by using a normalization exponential function with the temperature coefficient larger than 1.

8. The method according to any one of claims 1-5, wherein prior to said composing training data for model training, the method further comprises:

detecting data offset according to a preset time window;

when the occurrence of data offset is detected, the latest obtained data is marked, and the marked data is added to the marked data set to be used as the first data in the training data for training the first deep learning model obtained by the previous training at the later time.

9. The method of claim 8, wherein the performing the detection of the data offset according to the preset time window comprises performing the detection of the data offset based on observations of the following monitoring metrics:

10. The method of claim 9, wherein when the monitored metrics include a distribution of the predicted confidence, the distribution of the predicted confidence includes metrics of one or any combination of:

statistical indicators of prediction confidence of entity categories;

wherein the statistical indicator of the predicted confidence of the entity class comprises a statistical value of one or any combination of the following predicted confidence of the entity: mean, median, standard deviation, maximum and minimum; the statistical indicator of the predicted confidence of the relationship between the entities comprises the statistical value of one or any combination of the following predicted confidence of the relationship of the entity time: mean, median, standard deviation, maximum and minimum; the prediction category is a sample of a preset category, and the ratio of the sample to the data contained in all unlabeled data sets comprises: more than one entity, the duty cycle of the entities contained in all unlabeled data sets; a statistical indicator of a predictive confidence of a relationship category between the entities, comprising: a relationship between more than one entity, a ratio of the relationship between the entities contained in all unlabeled data sets.

11. The method of claim 9, wherein the monitoring metrics include a feature distribution of the text information extracted from the vulnerability report when the feature distribution of the text information extracted from the vulnerability report includes metrics of one or any combination of:

the statistical index of the text information of the extracted entity category;

the statistical index of the text information of the extracted entity category comprises: an index of one or any combination of the following of the extracted entities: number of numeric characters, character length, upper case letter duty ratio and special character count; the extracted entity meeting the preset category of the preset characteristics comprises the following proportion in all the extracted entities: enumerating, for each of the more than one entity, the extracted duty ratios of the entities present in the CPE dictionary for the generic platform among all the extracted entities; the ratio of the extracted relation meeting the preset category of the preset characteristics in the relation of all the extracted categories comprises the following steps: the extracted entity pairs which exist in the CPE dictionary and accord with the relation between the entities, and the duty ratio of the extracted entity pairs which accord with the relation between the entities.

12. A computer storage medium having stored therein a computer program which, when executed by a processor, implements a method of implementing model training according to any of claims 1-11.

13. A terminal, comprising: a memory and a processor, the memory storing a computer program; wherein,,

the processor is configured to execute the computer program in the memory;

the computer program, when executed by the processor, implements a method of implementing model training as claimed in any one of claims 1-11.