WO2021231301A1

WO2021231301A1 - Scalable active learning using one-shot uncertainty estimation in bayesian neural networks

Info

Publication number: WO2021231301A1
Application number: PCT/US2021/031587
Authority: WO
Inventors: Vineeth Rakesh MOHAN; Swayambhoo JAIN; Ashish KATHYAR; Shahab Hamidi-Rad; Jaideep Chandrashekar; Ajith PUDHIYAVEETIL
Original assignee: Interdigital Patent Holdings, Inc.
Priority date: 2020-05-11
Filing date: 2021-05-10
Publication date: 2021-11-18

Abstract

The method for providing a neural network having neuron layers including unlabeled data, where weights that connect neuron layers at inputs have a distribution. A parametric probability distribution for the neuron layers is determined. The uncertainty for the unlabeled data at outputs of neuron layers in the neural network is estimated based on a comparison of the distributed weights at inputs with the parametric probability distribution. A corresponding device is also disclosed.

Description

SCALABLE ACTIVE LEARNING USING ONE-SHOT UNCERTAINTY ESTIMATION IN BAYESIAN NEURAL NETWORKS

TECHNICAL FIELD At least one of the present embodiments generally relates to a method or an apparatus for machine learning.

BACKGROUND

Deep learning is an aspect or subset of machine learning involving neural networks with a relatively large number of layers in the neural network. Modern deep learning models are extremely dense with billions of parameters. Despite providing state-of-the-art results in several challenging tasks such as image classification and natural language understanding, training large neural networks (NN) takes significant amount of time and resources. For example, the popular transformer-based natural language processing (NLP) model BERT takes anywhere between 4-16 days. One might argue that training is just an offline, one-time action, but in reality, this is not always true. There are a large number of applications where models need to be frequently retrained or tuned; here, the training effort becomes crucial. Examples include channel estimation in wireless networks (channel conditions change and evolve constantly at short time scales), intrusion detection in wireless networks, autonomous driving, or when pre-trained models need to be quickly adapted to different domains.

One existing approach to reducing the training effort is to use active learning. Here, small subsets of data (from a potentially large pool) are carefully selected so as to yield results comparable to using all of the available data. In other words, the objective is to reduce the training time with very little compromise on model accuracy. An effective way of achieving this objective is to select data samples for which the model exhibits a large degree of uncertainty in the resulting prediction or estimate. Thus, the first step is to identify such uncertain data points using machine learning models such as neural networks.

Active learning can be viewed as a supervised learning problem that arises naturally when there is an abundance of unlabeled data available and getting a label incurs a cost. The objective is to optimally select the samples to be labeled in order to get the best generalization using as few training samples as possible. Any application which requires expert knowledge falls under this category. Chemical images, biological images, geological images, medical images are examples of some applications which commonly require expert knowledge. It is also relevant to situations where labels are obtained by crowdsourcing, where asking for labels is associated with a cost. An approach to active learning is to selectively pick a subset of data (or samples) for which the system is most uncertain. Bayesian neural networks (BNN) may be used to get an estimate of uncertainty. In Bayesian neural networks, instead of learning point estimates of the weights that connect neurons, a probability distribution on the weights is learned. An estimate of the uncertainty of the neural network for a given input can be obtained by taking multiple instantiations of the neural network by sampling from the probability distribution of the weights and estimating the variance of the output. This estimate of the uncertainty can be computationally very expensive as an accurate estimation of the variance and requires a lot of samples from the neural network. In addition, Bayesian neural networks are computationally expensive, which makes them prohibitive to use when the unlabeled pool of data is large.

SUMMARY

Drawbacks and disadvantages of the prior art may be addressed by the general aspects described herein, which are directed to linear neural reconstruction for deep neural network compression.

According to a first aspect, there is provided a method. The method comprises steps for providing a neural network having neuron layers including unlabeled data, where weights that connect neuron layers at inputs have a distribution; determining a parametric probability distribution for the neuron layers; and estimating uncertainty for the unlabeled data at outputs of neuron layers in the neural network based on a comparison of the distributed weights at inputs with the parametric probability distribution.

According to another embodiment, the neuron layers at inputs have a Gaussian distribution. According to another embodiment, the neuron layers at outputs have a rectified

Gaussian distribution.

According to another embodiment, the neural network is Bayesian. According to another aspect, there is provided an apparatus. The apparatus comprises a processor. The processor can be configured to in a provided neural network having neuron layers including unlabeled data, where weights that connect neuron layers at inputs have a distribution; determine a parametric probability distribution for the neuron layers; and estimate uncertainty for the unlabeled data at outputs of neuron layers in the neural network based on a comparison of the distributed weights at inputs with the parametric probability distribution.

According to another embodiment, the neuron layers at inputs have a Gaussian distribution.

According to another embodiment, the neuron layers at outputs have a rectified Gaussian distribution.

According to another embodiment, the neural network is Bayesian.

According to another general aspect of at least one embodiment, there is provide a non-transitory computer readable medium containing data content generated according to any of the described embodiments or variants.

According to another general aspect of at least one embodiment, there is provided a computer program product comprising instructions which, when the program is executed by a computer, cause the computer to carry out any of the described embodiments or variants.

These and other aspects, features and advantages of the general aspects will become apparent from the following detailed description of exemplary embodiments, which is to be read in connection with the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

Figure 1 shows an example of an embodiment of a deep neural network.

Figure 2 shows an example of an embodiment of a method or apparatus in accordance with the present disclosure.

Figure 3 shows a flow diagram for determining variance uncertainty according to the general aspects described.

Figure 4 shows one embodiment of an apparatus comprising a typical processor arrangement in which the described embodiments can be implemented.

DETAILED DESCRIPTION

Deep Neural networks (DNNs) have demonstrated great empirical success in a wide range of machine-learning tasks, and recent works have started to show that their overparameterization allows them to provably reach global optima when training. The resulting networks however exhibit high levels of redundancy, and while overparameterization might be beneficial during the training, it is not mandatory to retain so much redundancy to achieve similar accuracy. From an architecture design point of view, it is also much easier to overparameterize an architecture without thinking about the associated complexity and defer memory concerns to a later design stage. Both during architecture search and for the transmission and storage of pre-trained models, model compression seeks to compute light-weight approximations of large trained neural networks.

Figure 1 shows an example of a conventional approach to estimating uncertainty for a DNN for processing a large dataset D. In the example of Figure 1 , operation proceeds as follows:

(1) Sub-sample a small dataset from a large dataset D.

(2) Create multiple weight values to create K instantiations of the same neural network architecture. Typically K is set anywhere between 10-50.

(3) Run the model on every single instance of the NN.

(4) Use outputs from different instance and use variance in the output as a measure of uncertainty estimate.

(5) Use uncertainty to select the next batch of M data points (typically decided based on datapoints that have very high uncertainty value).

(6) Retrain the model by adding M to the training set. From the above steps, one can see that creating multiple neural networks (NN) and evaluating each network over millions of datapoints is both resource and time consuming. This is extremely infeasible for very large NN such as BERT that has millions of parameters.

Figure 2 shows an example of an embodiment of a method or apparatus in accordance with the present disclosure that provides for addressing at least the issues described above. In Figure 2, the upper portion of the illustrated arrangement includes acquiring a small subset of a large data set, e.g., "Big Data D", followed by instantiating a single neural network and converting the neural network to Bayesian neural network (BNN). A BNNs can be considered to be a NN with distributions over weights. Thus, the conversion can be done based on, e.g., open source packages such as using Blitz- Bayesian-Pytorch for Pytorch and TensorFlow Probability for Tensorflow. Next, the lower portion of Figure 2 illustrates providing a one-shot estimation of uncertainty based, for example, on equations (2) and (4) described below. A one-shot estimation of uncertainty in accordance with the present disclosure provides a direct estimate of the mean and variance, thereby eliminating the need to create multiple NN instances and testing each instance on the entire big data. This results in a significant reduction of training time and minimizes resource utilization. In more detail, the present disclosure describes a one-shot uncertainty estimation method for efficient estimation of the uncertainty. We work with neural networks where the weights are Gaussian distributed, and the non-linearity is Rectified Linear Unit (ReLU). In the one-shot uncertainty estimation, we do just one forward pass. In this forward pass, at each neuron, we approximate the probability distribution parametrically. This is in contrast to taking multiple instantiations of the neural network and doing multiple forward passes.

The disclosure also demonstrates the effectiveness of our algorithm for fully connected neural networks and convolutional neural networks (CNN). ForCNNs, a modified procedure is needed for training. The convolutional layers, max pool layers and non- linear layers are the same as the traditional CNN, only the fully connected part of the CNN is Bayesian. The idea is that the role of the initial combination of convolutional, max- pool and non-linear layers is to extract the features of the input. The fully connected part takes these features as input and finally outputs the probabilities for different classes. This makes sense for the fully connected part to be Bayesian as it encodes how uncertain is the neural network given these features corresponding to the input.

Deep Neural Networks (DNNs) have shown state of the art performance in variety of domains such as computer vision, speech recognition, natural language processing, etc. This performance however comes at the cost of massive computational cost as DNNs tend to have a huge number of parameters often running into millions, and sometimes even billions. This leads to prohibitively high inference complexity - the computational cost of applying trained DNN to test data for inference. This high inference complexity is the main challenge in bringing the performance of DNNs to mobile or embedded devices with resource limitations on battery size, computational power, and memory capacity etc. Our algorithm is built on the Bayesian neural networks (BNN), where the weights are Gaussian distributed. The weights are assigned a prior distribution. Computing the posterior exactly is difficult, therefore the posterior distribution is approximated by a parametric distribution, in this case, the weights are independent Gaussian distributed. The approximation is obtained by finding the value of the parameters that minimize the Kullback-Leibler (KL) divergence between the posterior distribution and the parametric distribution. An estimate of the uncertainty of the neural network for a given input can be obtained by taking multiple instantiations of the neural network by sampling from the probability distribution of the weights and estimating the variance of the output which is a measure of the uncertainty. Our first contribution lies in estimating this variance in one shot, where we formulate an expression for approximating the variance. We call our technique as “one-shot uncertainty estimation”.

An exemplary method proposed in BLUNDELL ET AL., “Weight Uncertainty in Neural Networks” (arXiv: 1505.05424) uses a Bayesian approach to train a neural network by approximating the posterior distribution of the weights by independent Gaussian distributions. The current disclosure proposes a method to approximate the variance at the output of the neural network.

Referring to Figure 3, at step 110, an exemplary neural network having neuron layers is provided for analysis. At step 120, a weight distribution for the inputs of neuron layers is determined.

In an exemplary embodiment, the distribution of weights between node ϊ in layer l and node j in layer 1 + 1 be represented by

~ N m\^ί, (s ⁷)²). The number of neurons in layer l is denoted by n₍. The random variable for the distribution at the input of neuron ϊ in layer l is represented by S\ and the distribution at the output is represented by T . In this example, we follow the notation where the input layer is layer 0. The input is the row vector x = ( x₀ , x_lt ... x_no ) e RQ .

The input in the first layer is a summation of scaled and shifted Gaussian distributions and hence it is Gaussian. This is given by:

Upon passing through the Rectified Linear Unit (ReLU) non-linearity, the output distribution is a rectified Gaussian with all the probability mass from the negative region of the input Gaussian concentrated at 0.

Mean and Variance of Rectified Gaussian Distribution

At step 130 of Figure 3, a parametric probability distribution for the neuron layers is determined.

Let X ~ N m, s²) and Y = max(X, 0). Y is a rectified Gaussian distributed random variable. Let the p.d.f and c.d.f of the standard Gaussian be represented by / and F. We find the mean and standard deviation of a rectified Gaussian.

E[Y] = E[Y\Y > 0 ]Pr(Y > 0) + E[Y\Y = 0] x Pr(Y = 0)

= mF(m/s) + s/(-m/s)

Next, we calculate the second moment.

E[Y²] = E[Y²\Y > 0 ]Pr(Y > 0) + E[Y²\Y = 0] x Pr(Y = 0) (3)

Therefore, the variance of Y is given by:

+ms/(-m/s)(F(-m/s) - F(m/s))

Back to the approximation of the output variance. We can use the expression in Equations (2) and (4) to find the mean and variance of Y = max(0,Sf). Till now, there has been no approximation. For subsequent layers, we approximate the distribution at the input of each neuron with a Gaussian distribution. For layer 2, the input at neuron j is given by:

We approximate the distribution of S₂ ^] by proceeding as if all the components of the summation are independent. We approximate S₂ ^] as follows:

where

All we need to calculate now is

. Using the law of total variance, we have:

= E[(Wl^J)²]Var(Xi) + Var(W^)E[Yl}²

All these values have been calculated known (note that E[

²). Since S₂ ^] is approximated by a Gaussian distribution, Y is approximated by a rectified Gaussian distribution and the expressions in Equations (2) and (4) can be used to approximate E[Y^] and Var(Y ₂ ⁷).

This process can be continued till the final layer to find the mean and variance of the output of the fully connected neural network.

In a traditional neural network, the last layer is the softmax layer which converts the output of the neural network into probabilities. In our approach, for estimating the uncertainty of the neural network, we focus on the layer right before the softmax layer. The number of neurons in this layer are equal to the number of classes and the probability distribution at each neuron is approximated by a Gaussian distribution.

In the signal processing terminology, we can consider the distribution of each class as the signal. At step 140 of Figure 3, the uncertainty for outputs of neuron layers is estimated. In the present disclosure, the estimate of the uncertainty is the signal to interference plus noise ratio ( SINR ) for the dominant class. The interference is the signal from the other classes with the effective power given by their mean squared. The noise is the sum of the variances of all the classes. Let there be C classes and c^* be the class with the highest mean m(_{a ί}. The SINR for the dominant class c^* is given by:

An example of an embodiment in accordance with the present disclosure can involve a method or system for labeling and, in particular, to reduce cost associated with labeling applications, e.g., reducing costs such as processing resource requirements, latency, etc. Modern machine learning models (especially deep learning models) are data hungry. In fact, researchers have shown that deep learning models perform poorly on data sparse environment. Using one or more aspects or embodiments or features in accordance with the present disclosure, a subset of data for labeling can be prudently selected. This can significantly lower the cost of data labeling. In addition to this, using an embodiment of an active learning algorithm in accordance with one or more examples described herein, rapid and efficient training and/or re-training of neural networks can be achieved. This is very useful in applications like MPEG-NNR, where the model compression involves re-training (potentially- multiple times).

Another example of an embodiment in accordance with the present disclosure can involve active learning for wireless applications such as in wireless communication associated with or between mobile devices. For example, an embodiment of active learning in accordance with the present disclosure helps in achieving machine learning on the edge, whereby samples deemed important by one or more embodiments in accordance with the present disclosure are used to train deep learning models on edge devices. As another example, an embodiment of active learning in accordance with the present disclosure is also useful in the setup where edge devices collect data and training takes place on the edge server. In such setup, active learning in accordance with the present disclosure allows for selective transmission of important data to the edge server which results in resource efficiency. Another example of an application of an embodiment in accordance with the present disclosure can involve intrusion detection.

Figure 4 shows an example of an embodiment of an apparatus or system 200 in accordance with the present disclosure and involving compressing, encoding or decoding a deep neural network in a bitstream. The apparatus comprises Processor 210 and can be interconnected to a memory 220 through at least one port. Both

Processor 210 and memory 220 can also have one or more additional interconnections to external connections.

Processor 210 is also configured to either insert or receive parameters in a bitstream and, either compressing, encoding or decoding a deep neural network using the parameters.

The system includes at least one processor configured to execute instructions loaded therein for implementing, for example, the various aspects described in this document. Processor 210 can include embedded memory, input output interface, and various other circuitries as known in the art. The system includes at least one memory (e.g., a volatile memory device, and/or a non-volatile memory device). System includes a storage device 220, which can include non-volatile memory and/or volatile memory, including, but not limited to, Electrically Erasable Programmable Read-Only Memory (EEPROM), Read-Only Memory (ROM), Programmable Read-Only Memory (PROM), Random Access Memory (RAM), Dynamic Random Access Memory (DRAM), Static Random Access Memory (SRAM), flash, magnetic disk drive, and/or optical disk drive. The storage device 220 can include an internal storage device, an attached storage device (including detachable and non-detachable storage devices), and/or a network accessible storage device, as non-limiting examples.

System includes an encoder/decoder module configured (not shown), for example, to process data to provide an encoded video or decoded video, and the encoder/decoder module can include its own processor and memory. The encoder/decoder module represents module(s) that can be included in a device to perform the encoding and/or decoding functions. As is known, a device can include one or both of the encoding and decoding modules. Additionally, encoder/decoder module can be implemented as a separate element of system or can be incorporated within processor 210 as a combination of hardware and software as known to those skilled in the art.

Program code to be loaded onto processor 210 or encoder/decoder to perform the various aspects described in this document can be stored in storage device 220 and subsequently loaded onto memory for execution by processor 210. In accordance with various embodiments, one or more of processor 210, memory, storage device 220, and encoder/decoder module can store one or more of various items during the performance of the processes described in this document. Such stored items can include, but are not limited to, the input video, the decoded video or portions of the decoded video, the bitstream, matrices, variables, and intermediate or final results from the processing of equations, formulas, operations, and operational logic.

In some embodiments, memory inside of the processor 210 and/or the encoder/decoder module is used to store instructions and to provide working memory for processing that is needed during encoding or decoding. In other embodiments, however, a memory external to the processing device (for example, the processing device can be either the processor 210 or the encoder/decoder module) is used for one or more of these functions. The external memory can be the memory and/or the storage device 220, for example, a dynamic volatile memory and/or a non-volatile flash memory. In several embodiments, an external non-volatile flash memory is used to store the operating system of, for example, a television. In at least one embodiment, a fast external dynamic volatile memory such as a RAM is used as working memory for video coding and decoding operations, such as for MPEG-2 (MPEG refers to the Moving Picture Experts Group, MPEG-2 is also referred to as ISO/I EC 13818, and 13818-1 is also known as H.222, and 13818-2 is also known as H.262), HEVC (HEVC refers to High Efficiency Video Coding, also known as H.265 and MPEG-H Part 2), or WC (Versatile Video Coding, a new standard being developed by JVET, the Joint Video Experts Team).

The input to the elements of system can be provided through various input devices. Such input devices include, but are not limited to, (i) a radio frequency (RF) portion that receives an RF signal transmitted, for example, over the air by a broadcaster, (ii) a Component (COMP) input terminal (or a set of COMP input terminals), (iii) a Universal Serial Bus (USB) input terminal, and/or (iv) a High Definition Multimedia Interface (HDMI) input terminal. Other examples include composite video.

In various embodiments, the input devices have associated respective input processing elements as known in the art. For example, the RF portion can be associated with elements suitable for (i) selecting a desired frequency (also referred to as selecting a signal, or band-limiting a signal to a band of frequencies), (ii) downconverting the selected signal, (iii) band-limiting again to a narrower band of frequencies to select (for example) a signal frequency band which can be referred to as a channel in certain embodiments, (iv) demodulating the downconverted and band- limited signal, (v) performing error correction, and (vi) demultiplexing to select the desired stream of data packets. The RF portion of various embodiments includes one or more elements to perform these functions, for example, frequency selectors, signal selectors, band-limiters, channel selectors, filters, downconverters, demodulators, error correctors, and demultiplexers. The RF portion can include a tuner that performs various of these functions, including, for example, downconverting the received signal to a lower frequency (for example, an intermediate frequency or a near-baseband frequency) or to baseband. In one set-top box embodiment, the RF portion and its associated input processing element receives an RF signal transmitted over a wired (for example, cable) medium, and performs frequency selection by filtering, downconverting, and filtering again to a desired frequency band. Various embodiments rearrange the order of the above-described (and other) elements, remove some of these elements, and/or add other elements performing similar or different functions. Adding elements can include inserting elements in between existing elements, such as, for example, inserting amplifiers and an analog-to-digital converter. In various embodiments, the RF portion includes an antenna. Additionally, the USB and/or HDMI terminals can include respective interface processors for connecting system to other electronic devices across USB and/or HDMI connections. It is to be understood that various aspects of input processing, for example, Reed-Solomon error correction, can be implemented, for example, within a separate input processing 1C or within processor 210 as necessary. Similarly, aspects of USB or HDMI interface processing can be implemented within separate interface ICs or within processor 210 as necessary. The demodulated, error corrected, and demultiplexed stream is provided to various processing elements, including, for example, processor 210, and encoder/decoder operating in combination with the memory and storage elements to process the datastream as necessary for presentation on an output device.

Various elements of system can be provided within an integrated housing, Within the integrated housing, the various elements can be interconnected and transmit data therebetween using suitable connection arrangement, for example, an internal bus as known in the art, including the Inter-IC (I2C) bus, wiring, and printed circuit boards.

The system includes communication interface that enables communication with other devices via communication channel. The communication interface can include, but is not limited to, a transceiver configured to transmit and to receive data over communication channel. The communication interface can include, but is not limited to, a modem or network card and the communication channel can be implemented, for example, within a wired and/or a wireless medium.

Data is streamed, or otherwise provided, to the system, in various embodiments, using a wireless network such as a Wi-Fi network, for example IEEE 802.11 (IEEE refers to the Institute of Electrical and Electronics Engineers). The Wi Fi signal of these embodiments is received over the communications channel and the communications interface which are adapted for Wi-Fi communications. The communications channel of these embodiments is typically connected to an access point or router that provides access to external networks including the Internet for allowing streaming applications and other over-the-top communications. Other embodiments provide streamed data to the system using a set-top box that delivers the data over the HDMI connection. Still other embodiments provide streamed data to the system using the RF connection. As indicated above, various embodiments provide data in a non-streaming manner. Additionally, various embodiments use wireless networks other than Wi-Fi, for example a cellular network or a Bluetooth network.

The system can provide an output signal to various output devices, including a display, speakers, and other peripheral devices. The display of various embodiments includes one or more of, for example, a touchscreen display, an organic light-emitting diode (OLED) display, a curved display, and/or a foldable display. The display can be fora television, a tablet, a laptop, a cell phone (mobile phone), or another device. The display can also be integrated with other components (for example, as in a smart phone), or separate (for example, an external monitor for a laptop). The other peripheral devices include, in various examples of embodiments, one or more of a stand-alone digital video disc (or digital versatile disc) (DVR, for both terms), a disk player, a stereo system, and/or a lighting system. Various embodiments use one or more peripheral devices that provide a function based on the output of the system. For example, a disk player performs the function of playing the output of the system. In various embodiments, control signals are communicated between the system and the display, speakers, or other peripheral devices using signaling such as AV.Link, Consumer Electronics Control (CEC), or other communications protocols that enable device-to-device control with or without user intervention. The output devices can be communicatively coupled to system via dedicated connections through interfaces. Alternatively, the output devices can be connected to system using the communications channel via the communications interface. The display and speakers can be integrated in a single unit with the other components of system in an electronic device such as, for example, a television. In various embodiments, the display interface includes a display driver, such as, for example, a timing controller (T Con) chip.

The embodiments can be carried out by computer software implemented by the processor 210 or by hardware, or by a combination of hardware and software. As a non-limiting example, the embodiments can be implemented by one or more integrated circuits. The memory can be of any type appropriate to the technical environment and can be implemented using any appropriate data storage technology, such as optical memory devices, magnetic memory devices, semiconductor-based memory devices, fixed memory, and removable memory, as non-limiting examples. The processor 210 can be of any type appropriate to the technical environment, and can encompass one or more of microprocessors, general purpose computers, special purpose computers, and processors based on a multi-core architecture, as non limiting examples.

Note that the syntax elements as used herein are descriptive terms. As such, they do not preclude the use of other syntax element names.

When a figure is presented as a flow diagram, it should be understood that it also provides a block diagram of a corresponding apparatus. Similarly, when a figure is presented as a block diagram, it should be understood that it also provides a flow diagram of a corresponding method/process.

Various embodiments may refer to parametric models or rate distortion optimization. In particular, during the encoding process, the balance or trade-off between the rate and distortion is usually considered, often given the constraints of computational complexity. It can be measured through a Rate Distortion Optimization (RDO) metric, or through Least Mean Square (LMS), Mean of Absolute Errors (MAE), or other such measurements. Rate distortion optimization is usually formulated as minimizing a rate distortion function, which is a weighted sum of the rate and of the distortion. There are different approaches to solve the rate distortion optimization problem. For example, the approaches may be based on an extensive testing of all encoding options, including all considered modes or coding parameters values, with a complete evaluation of their coding cost and related distortion of the reconstructed signal after coding and decoding. Faster approaches may also be used, to save encoding complexity, in particular with computation of an approximated distortion based on the prediction or the prediction residual signal, not the reconstructed one. Mix of these two approaches can also be used, such as by using an approximated distortion for only some of the possible encoding options, and a complete distortion for other encoding options. Other approaches only evaluate a subset of the possible encoding options. More generally, many approaches employ any of a variety of techniques to perform the optimization, but the optimization is not necessarily a complete evaluation of both the coding cost and related distortion.

The implementations and aspects described herein can be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method), the implementation of features discussed can also be implemented in other forms (for example, an apparatus or program). An apparatus can be implemented in, for example, appropriate hardware, software, and firmware. The methods can be implemented in, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants ("PDAs"), and other devices that facilitate communication of information between end-users.

Reference to “one embodiment” or “an embodiment” or “one implementation” or “an implementation”, as well as other variations thereof, means that a particular feature, structure, characteristic, and so forth described in connection with the embodiment is included in at least one embodiment. Thus, the appearances of the phrase “in one embodiment” or “in an embodiment” or “in one implementation” or “in an implementation”, as well any other variations, appearing in various places throughout this application are not necessarily all referring to the same embodiment.

Additionally, this application may refer to “determining” various pieces of information. Determining the information can include one or more of, for example, estimating the information, calculating the information, predicting the information, or retrieving the information from memory.

Further, this application may refer to “accessing” various pieces of information. Accessing the information can include one or more of, for example, receiving the information, retrieving the information (for example, from memory), storing the information, moving the information, copying the information, calculating the information, determining the information, predicting the information, or estimating the information.

Additionally, this application may refer to “receiving” various pieces of information. Receiving is, as with “accessing”, intended to be a broad term. Receiving the information can include one or more of, for example, accessing the information, or retrieving the information (for example, from memory). Further, “receiving” is typically involved, in one way or another, during operations such as, for example, storing the information, processing the information, transmitting the information, moving the information, copying the information, erasing the information, calculating the information, determining the information, predicting the information, or estimating the information.

It is to be appreciated that the use of any of the following 7”, “and/or”, and “at least one of”, for example, in the cases of “A/B”, “A and/or B” and “at least one of A and B”, is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of both options (A and B). As a further example, in the cases of “A, B, and/or C” and “at least one of A, B, and C”, such phrasing is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of the third listed option (C) only, or the selection of the first and the second listed options (A and B) only, or the selection of the first and third listed options (A and C) only, or the selection of the second and third listed options (B and C) only, or the selection of all three options (A and B and C). This may be extended, as is clear to one of ordinary skill in this and related arts, for as many items as are listed. Also, as used herein, the word “signal” refers to, among other things, indicating something to a corresponding decoder. For example, in certain embodiments the encoder signals a particular one of a plurality of transforms, coding modes or flags. In this way, in an embodiment the same transform, parameter, or mode is used at both the encoder side and the decoder side. Thus, for example, an encoder can transmit (explicit signaling) a particular parameter to the decoder so that the decoder can use the same particular parameter. Conversely, if the decoder already has the particular parameter as well as others, then signaling can be used without transmitting (implicit signaling) to simply allow the decoder to know and select the particular parameter. By avoiding transmission of any actual functions, a bit savings is realized in various embodiments. It is to be appreciated that signaling can be accomplished in a variety of ways. For example, one or more syntax elements, flags, and so forth are used to signal information to a corresponding decoder in various embodiments. While the preceding relates to the verb form of the word “signal”, the word “signal” can also be used herein as a noun.

As will be evident to one of ordinary skill in the art, implementations can produce a variety of signals formatted to carry information that can be, for example, stored or transmitted. The information can include, for example, instructions for performing a method, or data produced by one of the described implementations. For example, a signal can be formatted to carry the bitstream of a described embodiment. Such a signal can be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal. The formatting can include, for example, encoding a data stream and modulating a carrier with the encoded data stream. The information that the signal carries can be, for example, analog or digital information. The signal can be transmitted over a variety of different wired or wireless links, as is known. The signal can be stored on a processor-readable medium.

Various methods are described herein, and each of the methods comprises one or more steps or actions for achieving the described method. Unless a specific order of steps or actions is required for proper operation of the method, the order and/or use of specific steps and/or actions may be modified or combined.

Various numeric values are used in the present application. The specific values are for example purposes and the aspects described are not limited to these specific values. This application describes a variety of aspects, including tools, features, embodiments, models, approaches, etc. Many of these aspects are described with specificity and, at least to show the individual characteristics, are often described in a manner that may sound limiting. However, this is for purposes of clarity in description, and does not limit the application or scope of those aspects. Indeed, all of the different aspects can be combined and interchanged to provide further aspects.

This application describes a number of examples of embodiments, across various claim categories and types. Features of these embodiments can be provided alone or in any combination. Further, embodiments can include one or more of the following features, devices, or aspects, alone or in any combination, across various claim categories and types:

• A method, process, apparatus, medium storing instructions, medium storing data, or signal according to any of the embodiments described.

• A TV, set-top box, cell phone, tablet, or other electronic device that performs transform method(s) according to any of the embodiments described.

• A TV, set-top box, cell phone, tablet, or other electronic device that performs transform method(s) determination according to any of the embodiments described, and that displays (e.g. using a monitor, screen, or other type of display) a resulting image.

• A TV, set-top box, cell phone, tablet, or other electronic device that selects, bandlimits, or tunes (e.g. using a tuner) a channel to receive a signal including an encoded image, and performs transform method(s) according to any of the embodiments described. · A TV, set-top box, cell phone, tablet, or other electronic device that receives (e.g. using an antenna) a signal over the air that includes an encoded image, and performs transform method(s).

Claims

CLAIMS:

1. A method, comprising: providing a neural network having neuron layers including unlabeled data, where weights that connect neuron layers at inputs have a distribution; determining a parametric probability distribution for the neuron layers; and estimating uncertainty for the unlabeled data at outputs of neuron layers in the neural network based on a comparison of the distributed weights at inputs with the parametric probability distribution.

2. An apparatus, comprising: a processor, configured to: obtain neuron layers of a neural network including unlabeled data, where weights that connect the neuron layers at inputs have a distribution; determine a parametric probability distribution based on the weight distribution for the neurons; and estimate uncertainty for the unlabeled data at outputs of neuron layers in the neural network based on a comparison of the distributed weights at inputs with the parametric probability distribution.

3. The method of claim 1 , or the apparatus of claim 2, wherein the neuron layers at inputs have a Gaussian distribution.

4. The method of claim 1 , or the apparatus of claim 2, wherein the neuron layers at outputs have a rectified Gaussian distribution.

5. The method of claim 1 , or the apparatus of claim 2, wherein the neural network is Bayesian.

6. The method of any one of claims 1 , 3 or 4, wherein the providing comprises converting the neural network to a Bayesian neural network.

7. The apparatus of any one of claims 2-4, wherein the one or more processors being configured to obtain neuron layers of the neural network comprises the one or more processors being further configured to convert the neural network to a Bayesian neural network.

8. The method of any one of claims 1 or 3-6, wherein the estimating is based on a single instantiation of the neural network.

9. The apparatus of any one of claims 2-5 or 7, wherein the one or more processors being configured to estimate uncertainty is based on a single instantiation of the neural network.

10. A non-transitory computer readable medium containing data content generated according to the apparatus of any one of claims 2-5 or 7 or 9, or by the method of any one of claims 1 or 3-6 or 8, for playback using one or more processors.

11. A non-transitory computer-readable medium containing program instructions which, when executed by at least one processor, perform the method of any one of claims 1 or 3-6 or 8.

12. A computer program product comprising computing instructions for performing the method of any one of claims 1 or 3-6 or 8 when executed by one of more processors.

13. The apparatus of any one of claims 2-5 or 7 or 9, further comprising: at least one of (i) an antenna configured to receive a signal, the signal including data for processing by the apparatus, (ii) a band limiter configured to limit the received signal to a band of frequencies that includes the data, and (iii) a display configured to display information produced by processing the data.

14. The apparatus of claim 13, wherein the apparatus is included in one of a television, a television signal receiver, a set-top box, a gateway device, a mobile device, a cell phone, a tablet, a server, a computer or other electronic device.