CN112613617A

CN112613617A - Uncertainty estimation method and device based on regression model

Info

Publication number: CN112613617A
Application number: CN202011612532.2A
Authority: CN
Inventors: 周杰; 鲁继文; 李万华
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2021-04-06

Abstract

The application provides an uncertainty estimation method and device based on a regression model, and relates to the technical field of machine learning, wherein the method comprises the following steps: acquiring an input sample, and extracting probability distribution characteristics of the input sample; wherein the input sample has a tag value; sampling T characteristics from the probability distribution characteristics; wherein T is a positive integer; acquiring T loss functions corresponding to the T characteristics, and processing the T loss functions to acquire a training loss function; and inputting the input sample into the regression model for processing to obtain a predicted value, adjusting parameters of the regression model according to the label value and the predicted value through a training loss function to obtain a trained regression model, inputting the data to be processed into the trained regression model, and obtaining a regression result and a target value. Therefore, the uncertainty, namely the target value, of each test data can be given, meanwhile, the accuracy of the regression result is effectively improved in modeling uncertainty, and a regression model with better performance is obtained.

Description

Uncertainty estimation method and device based on regression model

Technical Field

The application relates to the technical field of machine learning, in particular to an uncertainty estimation method and device based on a regression model.

Background

In general, regression problems require that a corresponding target value y is often required to be predicted for a given datum x. The regression problem is a fundamental machine learning problem.

Most existing methods solve this problem based on deep neural networks, typically, a deep neural network is used to extract features from the data x, and then a regressor is used to regress the extracted features to specific values. Common regression strategies can be broadly divided into three categories: direct regression-based methods, classification-based methods, and sequence-based methods. Direct regression-based methods directly predict target values using a regressor, which is trained using the L1 or L2 loss functions during the training process. The classification-based method converts the regression problem into a classification problem, first divides the target space into several sub-categories, and then uses a regressor to perform learning of the classification task. The order-based approach implements a regressor using several binary classifiers, each responsible for predicting a binary classification problem. Most of these methods are based on neural network implementations, which tend to give over-confident predictions.

In an actual scenario, besides the model is required to give a predicted value, the confidence of the predicted value is often required to be known. Such as predicting the distance to a forward target in autonomous driving, the regressor may give a prediction but also needs to know to what extent this prediction can be relied upon. It is very dangerous to adopt all the predicted results given by the model directly without considering the confidence. In practice we should not adopt low confidence prediction results. Therefore, it is necessary to know what the uncertainty of the model is while performing model learning, that is, the prediction result given by the model for each data and the uncertainty of the prediction result.

Disclosure of Invention

The present application is directed to solving, at least to some extent, one of the technical problems in the related art.

Therefore, a first objective of the present application is to provide a regression model-based uncertainty estimation method, which can provide uncertainty, i.e. a target value, of each test data, and effectively improve accuracy of a regression result in modeling uncertainty, so as to obtain a regression model with better performance.

A second objective of the present application is to provide an uncertainty estimation device based on a regression model.

In order to achieve the above object, an embodiment of a first aspect of the present application provides a regression model-based uncertainty estimation method, including:

acquiring an input sample, and extracting probability distribution characteristics of the input sample; wherein the input sample has a tag value;

sampling T features from the probability distribution features; wherein T is a positive integer;

acquiring T loss functions corresponding to the T characteristics, and processing the T loss functions to acquire a training loss function;

and inputting the input sample into a regression model for processing to obtain a predicted value, adjusting parameters of the regression model according to the label value and the predicted value through the training loss function to obtain a trained regression model, so that the data to be processed is input into the trained regression model to obtain a regression result and a target value.

According to the uncertainty estimation method based on the regression model, the input sample is obtained, and the probability distribution characteristics of the input sample are extracted; wherein the input sample has a tag value; sampling T characteristics from the probability distribution characteristics; wherein T is a positive integer; acquiring T loss functions corresponding to the T characteristics, and processing the T loss functions to acquire a training loss function; and inputting the input sample into the regression model for processing to obtain a predicted value, adjusting parameters of the regression model according to the label value and the predicted value through a training loss function to obtain a trained regression model, inputting the data to be processed into the trained regression model, and obtaining a regression result and a target value. Therefore, the uncertainty, namely the target value, of each test data can be given, meanwhile, the accuracy of the regression result is effectively improved in modeling uncertainty, and a regression model with better performance is obtained.

In an embodiment of the present application, the extracting the probability distribution feature of the input sample includes:

and processing the input samples through two neural networks respectively to obtain the mean value and the variance of high-dimensional Gaussian distribution as the probability distribution characteristics.

In one embodiment of the present application, the formula for sampling T features from the probability distribution features is:

wherein the content of the first and second substances,

the input samples x, theta₁And theta₂For the parameters of the two neural networks, diag () represents taking its diagonal elements, and t is time.

In an embodiment of the present application, the processing the T loss functions to obtain a training loss function includes:

summing and averaging the T loss functions to obtain an average loss function;

and obtaining an ordered distribution constraint function, and calculating the sum of the average loss function and the ordered distribution constraint function as the training loss function.

In one embodiment of the present application, the training loss function is formulated as:

wherein the content of the first and second substances,

representing the mean loss function, D training data set, alpha being a hyperparameter, L_OrdRepresenting the ordered distribution constraint function.

To achieve the above object, a second aspect of the present application provides a regression model-based uncertainty estimation apparatus, including:

the first acquisition module is used for acquiring an input sample; wherein the input sample has a tag value;

the extraction module is used for extracting the probability distribution characteristics of the input samples;

a sampling module for sampling T features from the probability distribution features; wherein T is a positive integer;

a second obtaining module, configured to obtain T loss functions corresponding to the T features;

the processing module is used for processing the T loss functions to obtain training loss functions;

and the training estimation module is used for inputting the input sample into a regression model for processing to obtain a predicted value, adjusting the parameters of the regression model according to the label value and the predicted value through the training loss function to obtain a trained regression model, so that the data to be processed is input into the trained regression model to obtain a regression result and a target value.

The uncertainty estimation device based on the regression model obtains an input sample and extracts probability distribution characteristics of the input sample; wherein the input sample has a tag value; sampling T characteristics from the probability distribution characteristics; wherein T is a positive integer; acquiring T loss functions corresponding to the T characteristics, and processing the T loss functions to acquire a training loss function; and inputting the input sample into the regression model for processing to obtain a predicted value, adjusting parameters of the regression model according to the label value and the predicted value through a training loss function to obtain a trained regression model, inputting the data to be processed into the trained regression model, and obtaining a regression result and a target value. Therefore, the uncertainty, namely the target value, of each test data can be given, meanwhile, the accuracy of the regression result is effectively improved in modeling uncertainty, and a regression model with better performance is obtained.

In an embodiment of the application, the extraction module is specifically configured to

wherein the content of the first and second substances,

In an embodiment of the application, the processing module is specifically configured to

Summing and averaging the T loss functions to obtain an average loss function;

wherein the content of the first and second substances,

Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a schematic flowchart of a regression model-based uncertainty estimation method according to an embodiment of the present disclosure;

FIG. 2 is an exemplary diagram of regression model training in an embodiment of the present application;

FIG. 3 is an exemplary diagram of a probabilistic unordered feature-probabilistic ordered feature according to an embodiment of the application;

fig. 4 is a schematic structural diagram of an uncertainty estimation apparatus based on a regression model according to an embodiment of the present disclosure.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.

The regression model-based uncertainty estimation method and apparatus of the embodiments of the present application are described below with reference to the accompanying drawings.

Fig. 1 is a schematic flowchart of an uncertainty estimation method based on a regression model according to an embodiment of the present disclosure.

As shown in fig. 1, the regression model-based uncertainty estimation method includes the following steps:

step 101, obtaining an input sample, and extracting probability distribution characteristics of the input sample; where the input sample has a tag value.

Step 102, sampling T characteristics from probability distribution characteristics; wherein T is a positive integer.

And 103, acquiring T loss functions corresponding to the T features, and processing the T loss functions to acquire a training loss function.

And 104, inputting the input sample into the regression model for processing to obtain a predicted value, adjusting parameters of the regression model according to the label value and the predicted value through a training loss function to obtain a trained regression model, inputting the data to be processed into the trained regression model, and obtaining a regression result and a target value.

In the embodiment of the present application, extracting the probability distribution characteristics of the input sample includes: and processing the input samples through two neural networks respectively to obtain the mean value and the variance of high-dimensional Gaussian distribution as the probability distribution characteristics.

In an embodiment of the present application, a formula for sampling T features from the probability distribution features is:

wherein the content of the first and second substances,

input samples x, theta₁And theta₂For the parameters of both neural networks, diag () means to take its diagonal elements, and t is time.

In an embodiment of the present application, processing T loss functions to obtain a training loss function includes: summing and averaging the T loss functions to obtain an average loss function; and obtaining an ordered distribution constraint function, and calculating the sum of the average loss function and the ordered distribution constraint function as a training loss function.

In an embodiment of the present application, the training loss function is formulated as:

wherein the content of the first and second substances,

representing the mean loss function, D training data set, alpha being a hyperparameter, L_OrdAn ordered distribution constraint function is represented.

Specifically, aiming at the uncertainty estimation of the regression problem, the method is applied to various regression methods based on learning, on one hand, the performance of the regression method is improved, and meanwhile, the uncertainty corresponding to the model can be given to any test sample. In addition, the method also comprises ordered distribution constraint which aims to retain the orderliness in the target space in the feature space, so that more effective probability features are learned, and the performance of the model is improved.

Specifically, the representation of each sample in the feature space is considered to be a high-dimensional Gaussian distribution z-p (z | x). For an input sample x, firstly, the input sample x is sent to a neural network to extract features, and probability ordered features are learned, namely for each sample x, the features are represented as a probability distribution, and the probability features are represented by using a high-dimensional Gaussian distribution. Two neural networks were therefore used to predict the mean and variance of the gaussian distribution in the high dimension respectively:

and

respectively representing two neural networks with respective parameters theta₁And theta₂. Thus, for each sample x, the parameters of its probabilistic representation among the features are obtained using a neural network. T features are then sampled from the probability distribution. To allow subsequent gradient back-propagation, a heavily parametric technique may be used for sampling, equation (1).

After the sampled features are obtained, loss functions of the T features are obtained by using the sampled features to different regression methods.

Specifically, the direct regression method directly predicts a value using a regressor, i.e., a regression model, and trains using the L1 or L2 loss function, and then correspondingly, the probability-ordered features are applied to the T sampled features respectivelyThe upper loss function, taking the L2 loss function as an example, includes:

wherein y represents the predicted result of the regressor and w represents a parameter learnable in the regressor;

specifically, the original regression space is discretized into C classes based on a classification method, a cross entropy loss function is usually used for training, and similarly, the probability ordered features apply the above loss functions to the T sampled features respectively to obtain the following loss functions for training:

where C represents the number of all possible classes, C represents the true class label corresponding to sample x, and r is used to enumerate all possible classes.

Specifically, the sequence-based regression method firstly discretizes an original regression space into C categories, then uses C-1 binary classifiers, each classifier is responsible for predicting whether a label of a sample is greater than a certain sequence, in the training, C-1 cross entropy loss functions are used for training, and similarly, probability order features apply the above loss functions to T sampled features respectively to obtain the following loss functions for training:

wherein C represents all possible category numbers, wherein one total number of the C-1 binary classifiers is used, k is used for coordinate index with the value of 1-C-1 and b for respectively representing the prediction result and the label value on the C-1 binary classifiers_kShows the prediction result of the kth binary classifier, r_kIt represents the corresponding label value of sample x on the kth binary classifier.

Wherein, b_kShows the prediction result of the kth binary classifier, r_kIt represents the corresponding label value of the sample x on the kth binary classifier as shown in fig. 2.

In the training, in addition to the loss function described above,and further providing ordered distribution constraint, wherein the constraint considers that the target space in the regression problem is always ordered, and the learned probability ordered characteristics can keep the ordering in the target space. In particular, for one triplet (x)_l,x_m,x_n) And a corresponding label (y)_l,y_m,y_n) And learned probabilistic ordering feature (z)_l,z_m,z_n) The following constraints are learned:

where d () represents the distance between the distributions. Let S { (l, m, n) | | y_l-y_m|＜|y_l-y_n| define the ordered distribution constraint as:

where d () represents the distance between the distributions. To measure the distance between the distributions, similar performance can be obtained with two different metric functions. The first uses symmetric KL divergence distances and the second is the Wasserstein distance.

With the proposed ordered distribution constraint, the unordered probability features in the feature space eventually become ordered probability features, which is shown in fig. 3.

Finally, in the training process, the ordered feature constraint is applied to the loss function at the same time for training, so for the direct regression method, the final loss function is:

where D represents the training data set and α is a hyperparameter. For the classification-based approach, the final loss function is:

for order-based partiesThe final loss function is:

it should be noted that other methods can obtain similar forms, so that the probability-ordered features can be trained to obtain a regression model with higher performance. Meanwhile, the uncertainty of the data is modeled by the variance term in the probability ordered characteristic, and for each sample, the harmonic mean of the predicted variance term diag (Σ (x)) is calculated to represent the uncertainty of the sample.

Therefore, one probability distribution can be used in the feature space to model uncertainty, which can be applied to various learning-based regression methods and ultimately improve the performance of the method. Meanwhile, the uncertainty of each test sample can be calculated according to the variance item in the probability ordered features, so that an uncertainty index is provided in the deployment of a real scene.

In order to implement the above embodiments, the present application further provides an uncertainty estimation device based on a regression model.

As shown in fig. 4, the regression model-based uncertainty estimation apparatus includes: a first acquisition module 410, an extraction module 420, a sampling module 430, a second acquisition module 440, a processing module 450, and a training estimation module 460.

A first obtaining module 410 for obtaining an input sample; wherein the input sample has a tag value.

And an extracting module 420, configured to extract a probability distribution characteristic of the input sample.

A sampling module 430 for sampling T features from the probability distribution features; wherein T is a positive integer.

A second obtaining module 440, configured to obtain T loss functions corresponding to the T features.

The processing module 450 is configured to process the T loss functions to obtain training loss functions.

And a training estimation module 460, configured to input the input sample into a regression model for processing, to obtain a predicted value, and adjust a parameter of the regression model according to the label value and the predicted value through the training loss function, to obtain a trained regression model, so that data to be processed is input into the trained regression model, and a regression result and a target value are obtained.

In the embodiment of the present application, the extracting module 420 is specifically used for

wherein the content of the first and second substances,

In one embodiment of the present application, the processing module 450 is specifically configured for

Summing and averaging the T loss functions to obtain an average loss function;

wherein the content of the first and second substances,

It should be noted that the explanation of the embodiment of the uncertainty estimation method based on the regression model is also applicable to the uncertainty estimation device based on the regression model of the embodiment, and is not repeated here.

In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims

1. An uncertainty estimation method based on a regression model is characterized by comprising the following steps:

2. The method of claim 1, wherein said extracting probability distribution features of the input samples comprises:

3. The method of claim 2, wherein the formula for sampling T features from the probability distribution features is:

wherein the content of the first and second substances,

4. The method of claim 1, wherein said processing said T loss functions to obtain training loss functions comprises:

summing and averaging the T loss functions to obtain an average loss function;

5. The method of claim 1, wherein the training loss function is formulated as:

wherein the content of the first and second substances,

6. A regression model-based uncertainty estimation method, comprising:

7. The apparatus of claim 6, wherein the extraction module is specifically configured to

8. The method of claim 7, wherein the formula for sampling T features from the probability distribution features is:

wherein the content of the first and second substances,

9. The apparatus of claim 6, wherein the processing module is specifically configured to

Summing and averaging the T loss functions to obtain an average loss function;

10. The apparatus of claim 6, wherein the training loss function is formulated as:

wherein the content of the first and second substances,