LU102710B1

LU102710B1 - Input validation method for neural network model by crossing-layer dissection

Info

Publication number: LU102710B1
Application number: LU102710A
Authority: LU
Inventors: Jingwei Xu; Xiaoxing Ma; Huiyan Wang; Jian Lyu; Chang Xu
Original assignee: Nanjing University
Priority date: 2019-08-14
Filing date: 2019-10-17
Publication date: 2021-04-08
Also published as: WO2021027052A1; CN110633788A

Abstract

The present invention discloses an input validation method for a neural network model by crossing-layer dissection, which includes: providing a given neural network model and training data of the neural network model, extracting intermediate information, and generating a sub-model corresponding to each layer; inputting an input to be validated into the sub-model to obtain a whole behavior profile after the crossing-layer dissection; and analyzing a crossing-layer dissection profile of the input to validate whether the input is valid and provide a confidence score for validity. The method is based on the crossing-layer dissection within the training model, and the validity of a given input is analyzed by using the dissection behavior of the input analyzed at each layer of the model.

Description

INPUT VALIDATION METHOD FOR NEURAL NETWORK MODEL BY CROSSING-LAYER DISSECTION

TECHNICAL FIELD The present invention relates to the technical fields of neural network testing and input validation, and more particularly, to an input validation method for a neural network model by crossing-layer dissection.

BACKGROUND Neural network models are widely used in various fields such as image processing, object recognition, and autonomous vehicles. In such applications, a trained neural network model is typically instantiated and works as a central classifier to obtain satisfactory predicting or classifying results for new scenarios. However, due to the nature of model training, sometimes, inputs of new scenarios are not completely suitable for the trained neural network model. If appropriate emphasis is not placed on suitability when deploying the neural network models in practice for new scenarios, then it may cause unexpected effects of the neural network model. For example, when fed with images of unused scenarios, such as extreme weather or images of processing overexposure input, the trained model in autonomous vehicles might undergo serious traffic accidents since such images are beyond the model's handling capability, leading to severe consequences. Therefore, it is crucial to validate whether the fed input is valid or not for the neural network model.

Researchers in this art have worked on the recognition of invalid input, but it is still deficient in practice. On one hand, the recognition of invalid input is mainly based on a distance evaluation method. That is, the distance between a fed input and training data is evaluated to determine its validity. This method is restricted by the scale of the training data, and is thus difficult to apply to neural network models that typically require large-scale training in practice. On the other hand, since a neural network model LU102710 naturally has a certain generalization ability, its input processing ability is not strictly equivalent to its ability to process the training data. If the input is validated directly with the assistance of the training data, the accuracy of validation may suffer from some biases. Additionally, existing methods are often offline, of which the efficiency can hardly meet the requirements for real-time validation and thus, such methods can be difficult to be applied in deployed real-world scenarios.

SUMMARY In view of the problems and shortcomings identified in the prior art, an objective of the present invention is to provide an input validation method for a neural network model by crossing-layer dissection. This method has the advantages of usefulness, effectiveness and efficiency. Usefulness means that the method can be used in common neural network models trained by large-scale training data in real life, and the scale of the training scenario and the complexity of the neural network model do not severely restrict the application scenarios. Effectiveness means that the method has high accuracy in validating the inputs, which can effectively distinguish between valid and invalid inputs. Efficiency means that the method requires less time cost to validate the inputs, so that the method can meet the requirements for real-time validation, and can be deployed in a running neural network model for input validation at runtime.

The present invention provides the technical solutions as follows: an input validation method for a neural network model by crossing-layer dissection includes the following steps: step 1: providing a given neural network model and training data corresponding to the neural network model, inputting the training data into the given neural network model, extracting intermediate information of the data at each intermediate layer of the model during a training process, and training a sub-model corresponding to each layer according to the intermediate information, wherein each sub-model contains knowledge from an input layer to a corresponding intermediate layer of the given neural network model and simulates a prediction behavior of the given neural network model; LU102710 step 2: inputting an input to be validated into the sub-model corresponding to each intermediate layer obtained in step 1 to collect prediction behavior snapshots on the sub-model corresponding to each layer with increasing layers, and converging the prediction behavior snapshots to generate a whole behavior profile of the inputs in all sub-models; and step 3: based on the whole behavior profile obtained by dissecting a layer corresponding to the given input obtained in step 2, analyzing the validity of the prediction behavior snapshots at the layer and the validity of the whole behavior profile, and providing a validity confidence score to evaluate the validity.

To realize and optimize the above-mentioned technical solutions, the following specific measures are further provided.

Further, the neural network is a type of data structure formed by hierarchically connecting neurons for big data feature extraction and prediction, and includes an input layer, ahidden layer, and an output layer. Each layer contains a large number of neurons. The layers are connected to each other through the neurons, and information is transmitted from the input layer to the output layer. The neural network is, for example, various commonly used deep neural network (DNN), convolutional neural network (CNN), and recurrent neural network (RNN) models. The neuron is a data structure that receives inputs and uses built-in functions to perform operations on the input data to produce an output. The built-in functions are normally fixed and there are several commonly used popular activation kernel function forms, such as ReLU, Sigmoid, and Softmax. The input is a single input or a batch input of the neural network model. For example, for a neural network trained for an image classification problem, the input is a certain image file or a batch input composed of multiple images.

Further, in step 1, the given neural network model and a training data set of the neural network model are configured to extract the intermediate information of each layer of the training process. The intermediate information includes model parameter information (such as weight and bias in the CNN model) obtained by the neurons at each intermediate layer during the training process, an input value and an output value of each neuron, and others. The parameter information is configured to record LU102710 knowledge learned by a current model from the training data set through the training process. The input value and the output value are configured to provide training data for a subsequent training process of the sub-model.

Further, in step 1, the sub-model corresponding to each layer, such as a layer £, is a neural network model structurally similar to the given neural network model, and structurally includes two parts. The first part inherits all the model parameter information (such as weight and bias) obtained from the input layer to the corresponding layer k and a corresponding model structure of a meta-model obtained after the original training process of the given neural network. The second part uses a basic meta-model to connect neurons at the layer k and a prediction output neuron, uses the intermediate information (a set of outputs at the layer k obtained after the original training data are input to the given neural network) of the neurons at the layer Æ recorded in step 1 and predicted value labels corresponding to the original training set for retraining, and obtains parameter information of the second part after being trained. The parameters of the two parts are combined to obtain a sub-model structure with parameters. The basic meta-model is generally, but is not limited to, a linear regression model. The retraining is generally performed on the parameters of the second part, but is not limited thereto. The retraining can be performed on the parameters of the first part and the second part according to different application scenarios (such as performing overall fine-tuning on the model parameters by the crossing-layer dissection method).

Further, in step 2, the prediction behavior snapshots of the input on the sub-model corresponding to each layer are, but not limited to, a predicted probability distribution result and other information obtained after the input is fed into the sub-model corresponding to each layer for prediction.

Further, the whole behavior profile is a set of the prediction behavior snapshots obtained by each sub-model, is configured to validate and evaluate the input to be validated in step 3, and is a basic material.

Further, in step 3, methods for analyzing the validity of the prediction behavior snapshots at the corresponding layer by using the whole behavior profile in step 2 include: LU102710 method 1: considering a probability difference between a predicted maximum value and a final predicted value of a prediction behavior at a current layer, and taking a relative size proportion as a snapshot validity score of each layer; and method 2: after considering a direct prediction behavior difference between the current layer and a previous layer, taking a relative proportion of each probability change of the prediction behavior to a probability change of the final predicted value as the snapshot validity score of each layer.

Further, in step 3, methods for analyzing the validity of the whole profile by using the whole behavior profile in step 2 include: method 1: using an actual prediction accuracy of the training set with respect to each layer on the training set as a weight for an analysis and modeling, taking results of the analysis of the validity of the snapshots at each layer as an input of a linear model, setting parameters contained in the linear model based on the prediction accuracy of the training set, and finally calculating a final whole profile validity score by weighting based on the prediction accuracy; method 2: setting weights by observations and using commonly used growth function curves (linear, logarithmic, exponential), taking the results of the analysis of the validity of the snapshots at each layer as an input of a selected growth function, and artificially setting parameters contained in the growth function to calculate the final whole profile validity score; and method 3: obtaining the snapshots at each layer and the whole behavior profile by using the training set data, taking the analysis of the validity of the snapshots as input data, taking a corresponding validation result as labeled data, and training a model for calculating the final profile validity score by using a machine learning model; wherein the corresponding validation result is given artificially, or given based on the prediction accuracy of the given neural network model with respect to the input, but is not limited thereto; and the machine learning model adopts, but is not limited to, a known machine learning model such as a linear regression model, a logistic regression model, a support vector machine (SVM) model, and a neural network model.

Further, in step 3, the validity confidence score is a certain value between 0 and 1 LU102710 with regard to the whole profile validity score calculated for the given input to be validated, and represents a confidence degree with respect to the validity of the input to be validated. When the validity confidence score is close to 0, the input is more invalid. When the validity confidence score is close to 1, the input is more valid. The value range of the validity confidence score and the relationship between the validity confidence score and the input are not limited thereto.

Further, in step 3, the step of evaluating the validity includes: distinguishing between valid/invalid inputs by using a calculated validity confidence degree and setting a threshold, wherein the threshold is given in advance or obtained by experience and mainly depends on different tolerance levels with respect to valid inputs of scenarios during actual use of different models. Generally, when the security requirements are stringent, the threshold of the scenario is close to 1.

Advantages: Compared with the prior art, the present invention makes up for the shortcomings of the existing input validation techniques for neural network models. Specifically, the method of the present invention uses specific inputs to perform crossing-layer dissection in the model to efficiently measure and evaluate the validity of the inputs, and uses the evaluated validity to perform real-time screening on the inputs, thereby improving the actual deployment effect of the neural network model.

BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a system structure diagram of the present invention; FIG. 2 is a detailed structural diagram of a sub-model according to the present invention; FIG. 3 is a work flow chart of a sub-model generation module according to the present invention; FIG. 4 is a work flow chart of a crossing-layer behavior dissection module according to the present invention; and FIG. 5 is a work flow chart of a validity validation analysis module according to the present invention. LU102710

DETAILED DESCRIPTION OF THE EMBODIMENTS Hereinafter, the present invention will be further clarified in conjunction with the specific embodiments. It should be understood that these embodiments are only used to illustrate the present invention rather than to limit its scope. After reading the present invention, those skilled in the art can modify the present invention in various equivalent forms, and such modifications shall fall within the scope defined by the appended claims of the present invention.

The neural network is a type of data structure formed by hierarchically connecting neurons for big data feature extraction and prediction, and includes an input layer, a hidden layer, and an output layer. Each layer contains a large number of neurons, the layers are connected to each other through the neurons, and information is transmitted from the input layer to the output layer. The neural network is, for example, various commonly used deep neural network (DNN), convolutional neural network (CNN), and recurrent neural network (RNN) models. The neuron is a data structure that receives inputs and uses built-in functions to perform operations on the input data to produce an output. The built-in functions are fixed and are several commonly used popular activation kernel function forms, such as ReLU, Sigmoid, and Softmax. The input is a single input or a batch input of the neural network model. For example, for a neural network trained for an image classification problem, the input is a certain image file or a batch input composed of multiple images.

An input validation method for a neural network model by crossing-layer dissection includes the following steps: Step 1: a given neural network model and training data corresponding to the neural network model are provided. The training data are input into the given neural network model. Intermediate information of the training data at each intermediate layer of the model during a training process is extracted. A sub-model corresponding to each layer is trained according to the intermediate information obtained. Each sub-model contains knowledge from an input layer to a corresponding intermediate layer of the given neural LU102710 network model and simulates a prediction behavior of the given neural network model.

The intermediate information includes model parameter information (such as weight and bias in the CNN model) obtained by the neurons at each intermediate layer during the training process, an input value and an output value of each neuron, and others. The parameter information is configured to record knowledge learned by the current model from the training data set through the training process. The input value and the output value are configured to provide training data for a subsequent training process of the sub-model.

The sub-model corresponding to each layer, such as a layer Æ, is a neural network model structurally similar to the given neural network model, and structurally includes two parts. The first part inherits all the model parameter information, such as weight and bias, obtained from the input layer to the corresponding layer Æ and a corresponding model structure of a meta-model obtained after the original training process of the given neural network. The second part uses a basic meta-model to connect neurons at the layer k and a prediction output neuron, uses the intermediate information (a set of outputs at the layer k obtained after the original training data are input to the given neural network) of the neurons at the layer Æ recorded in step 1 and predicted value labels corresponding to the original training set for retraining, and obtains parameter information of the second part after being trained. The parameters of the two parts are combined to obtain a sub-model structure with parameters. The basic meta-model is generally, but is not limited to, a linear regression model. The retraining is generally only performed on the parameters of the second part, but is not limited thereto. The retraining can be performed on trains the parameters of the first part and the second part according to different application scenarios, such as performing overall fine-tuning of the model on parameters by the crossing-layer dissection method.

Step 2: an input to be validated is input to the sub-model corresponding to each intermediate layer obtained in step 1 to collect prediction behavior snapshots on the sub-model corresponding to each layer with increasing layers, and the prediction behavior snapshots are converged to generate a whole behavior profile of the inputs in all sub-models. LU102710 The prediction behavior snapshots of the input on the sub-model corresponding to each layer are, but not limited to, a predicted probability distribution result and other information obtained after the input is input into the sub-model corresponding to each layer for prediction.

The whole behavior profile is a set of the prediction behavior snapshots obtained by each sub-model, is configured to validate and evaluate the input to be validated in step 3, and is a basic material.

Step 3: based on the whole behavior profile obtained by dissecting a layer corresponding to the given input obtained in step 2, the validity of the prediction behavior snapshots at the layer and the validity of the whole behavior profile are analyzed, and a validity confidence score is provided to evaluate the validity.

Methods for analyzing the validity of the prediction behavior snapshots at the corresponding layer by using the whole behavior profile in step 2 include: method 1: a probability difference between a predicted maximum value and a final predicted value of a prediction behavior at the current layer is considered, and a relative size proportion is taken as a snapshot validity score of each layer; and method 2: after a direct prediction behavior difference between the current layer and the previous layer is considered, a relative proportion of each probability change of the prediction behavior to a probability change of the final predicted value is taken as the snapshot validity score of each layer.

Methods for analyzing the validity of the whole profile by using the whole behavior profile in step 2 include: method 1: an actual prediction accuracy of the training set with respect to each layer on the training set is used as a weight, the analysis of the validity of the snapshots at each layer is integrated to calculate a final profile validity score; method 2: weights are set by using observations and commonly used growth function curves (e.g., linear, logarithmic, exponential), the analysis of the validity of the snapshots at each layer is integrated to calculate the final profile validity score; and method 3: the snapshots at each layer and the whole behavior profile are obtained by using the training set data, the analysis of the validity of the snapshots is taken as LU102710 input data, a corresponding validation result is taken as labeled data, and a model for calculating the final profile validity score is trained by using a machine learning model.

The corresponding validation result is given artificially, or given based on the prediction accuracy of the given neural network model with respect to the input, but is not limited thereto. The machine learning model adopts, but is not limited to, a known machine learning model such as a linear regression model, a logistic regression model, a support vector machine (SVM) model, and a neural network model.

The validity confidence score is a certain value between 0 and 1 with regard to the whole profile validity score calculated for the given input to be validated, and represents a confidence degree with respect to the validity of the input to be validated. When the validity confidence score is close to 0, the input is more invalid. When the validity confidence score is close to 1, the input is more valid.

The step of evaluating the validity includes: distinguishing between valid/invalid inputs by using a calculated validity confidence degree and setting a threshold. The threshold is given in advance or obtained by experience and mainly depends on different tolerance levels with respect to valid inputs of scenarios during the actual use of different models. Generally, when the security requirements are stringent, the threshold of the scenario is close to 1.

FIG. 1 shows an input validation method for a neural network model by crossing- layer dissection according to an embodiment of the present invention. Firstly, the original data network model and a training data set thereof are used in advance/offline to generate a sub-model corresponding to each layer and form a sub-model pool. Each single sub-model in the pool corresponds to the knowledge contained in a specific layer in the original model and is used for prediction. Secondly, a given input to be validated is input into each model in the sub-model pool for prediction. A crossing-layer behavior of each layer of the original neural network is dissected according to layer-wise predictions. A crossing-layer dissection whole behavior (profile), including a crossing- layer prediction behavior (snapshot) corresponding to each layer, is output. Finally, the crossing-layer prediction whole behavior profile and the crossing-layer prediction behavior snapshots contained therein are integrated into a validity analysis module to LU102710 output a validity analysis report. The framework of the whole method contains three modules corresponding to three steps: a sub-model generation module, a crossing-layer behavior dissection module, and an input validity analysis module.

Step 1: the sub-model generation module generates a sub-model corresponding to each layer.

As shown in FIG. 2, the structure of the sub-model corresponding to the selected layer k is designed to include two parts. The first part is a copy of the original neural network model structure from the input layer to the currently selected layer of the original model, and specifically includes structural information, parameter information, and other information on the model. The second part is a retraining model using a meta- model structure based on the output value of the input data at the current layer Æ and the final predicted value. The meta-model shown in the drawings is a single-layer linear fully connected model, that is, a linear regression model, but is not limited thereto.

As shown in FIG. 3, FIG. 3 is a work flow chart of the sub-model generation module. The original neural network model and the training data set are input. Firstly, all subsequent intermediate results, such as intermediate input and output values of neurons at each layer are saved. Then, the sub-model is generated by iteratively selecting the A layer. Specifically, the first part and the second part of the sub-model are separately generated and then the first part and the second part are spliced to generate the sub-model. Finally, the sub-models generated by all selected layers are integrated and output to form a sub-model pool.

Step 2: the crossing-layer behavior dissection module analyzes the crossing-layer behavior of the input to be validated.

As shown in FIG. 4, FIG. 4 is a work flow chart of the crossing-layer behavior dissection module. The input to be validated 1s input into the sub-model pool obtained in step 1, so that the input to be validated is input into the sub-model corresponding to each layer in the sub-model pool to obtain the crossing-layer behavior snapshots. Each snapshot reflects the behavior information of a specific layer of the original model corresponding to the sub-model. Finally, the snapshots are converged to form the whole behavior profile of the input, so as to reflect the dissection behavior of the input directly LU102710 transmitted at the layer of the original model.

Step 3: the input validity analysis module analyzes the validity of the input to be validated to provide a report.

As shown in FIG. 5, FIG. 5 shows the input validity analysis module. For the crossing-layer dissection whole behavior profile obtained in step 2 according to the input to be validated, the analysis method (weight-based or learning-based) is selected, and the validity degree of the whole behavior profile is calculated. Specifically, the analysis method first adopts a snapshot analysis method (a relative size proportion of the probability difference between the predicted maximum value and the final predicted value, or a relative proportion of the probability change to a probability change of the final predicted value, as described above). For a single snapshot to be scored, a specific profile analysis method (two weight-based analysis methods or one learning-based analysis method, as described above) is selected. The scores for all snapshots are integrated as an evaluation of the validity of the whole behavior profile for the entire crossing-layer dissection. By determining a security threshold required by an actual application scenario, a decision on whether the input to be validated is valid is further obtained and reported. In actual scenarios, the threshold increases along with the increasingly strict security requirements, so that the proportion of invalid inputs in the same situation is relatively increased. This method combined with the subsequent process of filtering invalid inputs can reasonably select the mode of inputting the inputs into the neural network model in the actual scenario for a judgment, thereby increasing the accuracy of the neural network model during actual use.

Claims

CLAIMS LU102710

1. An input validation method for a neural network model by a crossing-layer dissection, comprising the following steps: step 1: providing a given neural network model and training data corresponding to the neural network model, inputting the training data into the given neural network model, extracting intermediate information of the data at each intermediate layer of the model during a training process, and training a sub-model corresponding to each layer according to the intermediate information, wherein each sub-model contains knowledge from an input layer to a corresponding intermediate layer of the given neural network model and simulates a prediction behavior of the given neural network model; step 2: inputting an input to be validated into the sub-model corresponding to each intermediate layer obtained in step 1 to collect prediction behavior snapshots on the sub-model corresponding to each layer with increasing layers, and converging the prediction behavior snapshots to generate a whole behavior profile of the inputs in all sub-models; and step 3: based on the whole behavior profile obtained by dissecting a layer corresponding to the given input obtained in step 2, analyzing a validity of the prediction behavior snapshots at the layer and the validity of the whole behavior profile, and providing a validity confidence score to evaluate the validity.

2. The input validation method for the neural network model by the crossing-layer dissection according to claim 1, characterized in that, in step 1, the given neural network model and a training data set of the neural network model are provided, and the intermediate information of each layer of the training process is extracted; wherein the intermediate information comprises model parameter information obtained by neurons at each intermediate layer during the training process, and an input value and an output value of each neuron; the parameter information is configured to record knowledge learned by a current model from the training data set through the training process; and the input value and the output value are configured to provide training data for a subsequent training process of the sub-model. LU102710

3. The input validation method for the neural network model by the crossing-layer dissection according to claim 1, characterized in that, in step 1, the sub-model corresponding to a layer Æ is a neural network model structurally similar to the given neural network model, and the sub-model structurally comprises two parts; wherein the first part inherits all the model parameter information obtained from the input layer to the corresponding layer Æ and a corresponding model structure of a meta-model obtained after an original training process of the given neural network; the second part uses a basic meta-model to connect the neurons at the layer Æ and a prediction output neuron, and uses the intermediate information of the neurons at the layer Æ recorded in step 1 and predicted value labels corresponding to the original training set for retraining to obtain parameter information of the second part after being trained; the parameters of the two parts are combined to obtain a sub-model structure with parameters; and the basic meta-model is generally a linear regression model.

4. The input validation method for the neural network model by the crossing-layer dissection according to claim 1, characterized in that, in step 2, the prediction behavior snapshots of the input on the sub-model corresponding to each layer are predicted information obtained after the input is input into the sub-model corresponding to each layer for prediction.

5. The input validation method for the neural network model by the crossing-layer dissection according to claim 1, characterized in that the whole behavior profile is a set of the prediction behavior snapshots obtained by each sub-model, and the whole behavior is configured to validate and evaluate the input to be validated in step 3.

6. The input validation method for the neural network model by the crossing-layer dissection according to claim 1, characterized in that, in step 3, methods for analyzing the validity of the prediction behavior snapshots at the corresponding layer by using the whole behavior profile in step 2 comprise: LU102710 method 1: considering a probability difference between a predicted maximum value and a final predicted value of a prediction behavior at a current layer, and taking a relative size proportion as a snapshot validity score of each layer; and method 2: after considering a direct prediction behavior difference between the current layer and a previous layer, taking a relative proportion of each probability change of the prediction behavior to a probability change of the final predicted value as the snapshot validity score of each layer.

7. The input validation method for the neural network model by the crossing-layer dissection according to claim 1, characterized in that, in step 3, methods for analyzing the validity of the whole profile by using the whole behavior profile in step 2 comprise: method 1: using an actual prediction accuracy of a training set with respect to each layer on the training set as a weight, integrating an analysis of the validity of the snapshots at each layer to calculate a final profile validity score; method 2: setting weights by using observations and commonly used growth function curves, and integrating the analysis of the validity of the snapshots at each layer to calculate the final profile validity score; and method 3: obtaining the snapshots at each layer and the whole behavior profile by using the training set data, taking the analysis of the validity of the snapshots as input data, taking a corresponding validation result as labeled data, and training a model for calculating the final profile validity score by using a machine learning model.

8. The input validation method for the neural network model by the crossing-layer dissection according to claim 1, characterized in that, in step 3, the validity confidence score is a certain value between 0 and 1 with regard to the whole profile validity score calculated for the given input to be validated, and represents a confidence degree with respect to the validity of the input to be validated.

9. The input validation method for the neural network model by the crossing-layer dissection according to claim 1, characterized in that, in step 3, the step of evaluating LU102710 the validity comprises: distinguishing between valid/invalid inputs by using a calculated validity confidence degree and setting a threshold.