CN110110794B

CN110110794B - Image classification method for updating neural network parameters based on feature function filtering

Info

Publication number: CN110110794B
Application number: CN201910389454.5A
Authority: CN
Inventors: 文成林; 翟凯凯
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2019-05-10
Filing date: 2019-05-10
Publication date: 2021-06-29
Anticipated expiration: 2039-05-10
Also published as: CN110110794A

Abstract

The invention discloses an image classification method for updating neural network parameters based on characteristic function filtering. The characteristic function filtering used in the invention only needs to assume that the measurement error has a mean value and the model noise has a distribution function. The invention effectively solves the problems of local convergence, excessive calculation complexity and the like in the general neural network parameter updating method for image classification, realizes the online self-adaptive updating of the neural network parameters, and can update the network parameters without combining the old image samples when a new image sample set is input, so that the network model can adapt to the change of the image working condition.

Description

Image classification method for updating neural network parameters based on feature function filtering

Technical Field

The invention belongs to the technical field of image classification in artificial intelligence, and relates to an image classification method for updating neural network parameters based on feature function filtering.

Background

Artificial intelligence is a new technical science for studying and developing theories, methods, techniques and application systems for simulating, extending and expanding human intelligence. Artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence, a field of research that includes robotics, language recognition, image recognition, natural language processing, and expert systems, among others.

The neural network is an arithmetic mathematical model which imitates the behavior characteristics of the animal neural network and performs distributed parallel information processing. The network achieves the purpose of processing information by adjusting the mutual connection relationship among a large number of internal nodes depending on the complexity of the system, and has self-learning and self-adapting capabilities. The intelligent network system is an important component of artificial intelligence and consists of an input layer, an output layer and a single hidden layer or a plurality of hidden layers contained between the input layer and the output layer. The structural design of the neural network mainly comprises the following parts: determining the number of hidden layers, the number of nodes in each hidden layer, selecting an excitation function of each node and the like. When the structural problem is determined, the most important of the remaining problems is to construct an objective function and how to identify a plurality of parameters included in the network under certain criteria.

For parameter identification methods in the neural network, methods such as a gradient descent method, a least square method and the like have some defects. For example, the step length in the gradient descent method iterative algorithm is difficult to select and has a general standard; the complexity of the algorithm increases exponentially with the number of hidden layers and the number of nodes in each hidden layer; since it is a local linearization problem in nature, it will generate a large training error due to the increased nonlinearity of the objective function, and it will easily converge to a local extremum.

Image classification is a very active research direction in the fields of computer vision, pattern recognition and machine learning. Image classification and detection are widely applied in many fields, including face recognition, pedestrian detection, intelligent video analysis, pedestrian tracking and the like in the security field, traffic scene object recognition, vehicle counting, retrograde motion detection, license plate detection and recognition in the traffic field, content-based image retrieval, automatic album classification and the like in the internet field. The image classification is usually implemented using a neural network, however, as described above, the problem in the parameter updating method of the neural network is the bottleneck of the image classification accuracy and the time complexity. Therefore, a new neural network parameter updating method has important significance for image classification.

Disclosure of Invention

Aiming at the defects of the prior art, the invention designs an image classification method for updating neural network parameters based on feature function filtering. The characteristic function filtering used in the method only needs to assume that the measurement error has a mean value, the model noise has a distribution function, the problems of local linearization and local convergence do not exist, and the real-time online self-adaptive updating of the network parameters can be realized. The method is used for image identification and classification, can improve the accuracy of image classification and reduce the time complexity of parameter updating.

The invention comprises the following steps:

step (1) establishing a sample input set x (k) ═ x₁(k),x₂(k),…,x_n(k)]^TTo output set y (k) ═ y₁(k),y₂(k),…,y_m(k)]^TThe neural network model of the relationship mapping between the samples, wherein the sample input set is the feature value of each image sample after preprocessing, the sample output set is the classification category of each corresponding image, k is the selection of the kth sample set, and x_n(k) For the nth input of the kth sample, y_m(k) Is the mth output of the kth sample.

Wherein g (-) is an activation function, usually selected from sigmoid function, ReLU function, Gaussian function, polynomial, etc., ω_i＝[ω_i1,ω_i2,…,ω_il]^TI is 1, …, n and a are weight parameter and bias parameter of hidden layer, ω is_iEach item in the list is the component, l is the number of nodes of the single hidden layer, and beta is the weight parameter of the output layer.

The loss function of which has the general form of

Wherein,

and establishing a picture classification result for the model.

The specific form of the loss function used herein is

Step (2) initializing weight parameters and bias parameters of a hidden layer and weight parameters of an output layer in a network

In each iteration, under the condition that hidden layer weight parameters and bias parameters are randomly given, all parameter solving problems of the network are converted into a problem of solving the output layer weight parameters beta through least squares. The algorithm is described in detail as follows:

when the activation function of the hidden layer is infinitely differentiable, the neural network does not need to solve all parameters any more, the hidden layer weight parameter and the hidden layer bias parameter can be realized in a random selection mode and are kept unchanged in the whole process, and at the moment, if the description of the model in the formula (1) is changed into the following form:

y(k)＝H(k)β (3)

wherein

H(k)＝[H₁(k) H₂(k) … H_l(k)]^T

Then since the hidden layer weight parameters and hidden layer bias parameters are determined so that h (k) is known, the problem at this time can be transformed into how to solve the output layer weight parameters β by equation (3), and the objective function is also transformed from equation (2) into the following form:

here, a least squares method is used, then the solution is

In the above formula, H^-1Is the Moore-Penrose inverse of the hidden layer output matrix H.

And (3) updating the output layer weight parameter beta of the neural network through inputting the current new image sample.

Where Kalman filtering is used to perform real-time updates of the output layer weight parameters β. To perform real-time updating of the parameters using Kalman filtering, the state equations and measurement equations that conform to Kalman filtering must be established. Considering that the weight parameter beta of the output layer to be estimated is slowly changed by certain random interference, the state equation of Kalman filtering is modeled as follows:

β(k+1)＝A(k+1,k)β(k)+w(k) (4)

in order to simulate the interference on the parameter to be estimated, a white noise sequence w (k) is added into the equation.

From equation (3), the measurement equation can be obtained as follows:

y(k)＝Hβ(k)+v(k) (5)

where v (k) is also a white noise sequence, similar to the equation of state.

In the Kalman filtering model, the process noise w (k) and the observation noise v (k) are both white noise sequences, and are constant values in the sampling interval. And E { w (k) w ' (k) } Q and E { v (k) v ' (k) } R, a (k +1, k) E, when w (k) and v (k) are independent from each other, E { w (k) v ' (k) } 0, and β (k) is the kth output layer weight parameter.

Then the optimal estimated value of the weight parameter β of the (k +1) th output layer solved by the model is:

wherein,

represents a predicted value of the weight parameter beta of the (k +1) th output layer; k (K +1) is the (K +1) th optimal gain array;

is the estimated value of the weight parameter beta of the output layer of the (k +1) th.

And (4) updating the hidden layer weight parameter and the hidden layer bias parameter through feature function filtering.

The feature function filtering is a novel non-gaussian filtering method, and in the feature function filtering, when an observation equation is nonlinear with respect to a state variable, if the following two requirements are met:

the method comprises the following steps of 1: { w (k) } and { v (k) } are bounded stationary random processes, x (0) is the initial state, { w (k) }, { v (k) } and x (0) are independent of each other, and the distribution function of { w (k) } is known, and its characteristic function is

{ v (k) } mean known, | E (w (k)) > Y<+∞。

The method comprises the following steps: h (-) is a known Bohr measurable and smooth nonlinear function.

Then a filter of the form:

wherein A (k) is a state transition matrix,

is an estimate of the kth state quantity,

predicted value for the k +1 th observation, U (k) e R^n×lFor a gain matrix to be designed, the acquisition of u (k) is the core and key of the whole filter design.

Order to

The available estimation error equation is

The performance index is

Wherein,

the weighting function K (t) is chosen to ensure J₀Is real and bounded, it is a given positive definite weight matrix, and in order to constrain the gain matrix, a filter gain matrix can be found by minimizing this performance index.

The gain matrix K (K +1) solving method and process are given below.

If p is₁(k) And p₂(k) Are respectively as

The performance index can be rewritten as

To obtain the gain matrix K (K +1), let

Obtaining the extreme point of the performance index

Due to the fact that

Therefore, the solution obtained by equation (13) is the extreme point of the minimum performance index.

The design of the parameter updating method based on the characteristic function can be realized by two steps: firstly, updating the hidden layer weight parameter omega and the bias parameter a, and secondly, updating the output layer weight parameter beta. But is divided into three steps here due to its high complexity. It is described as follows:

for the single hidden layer neural network with the hidden layer weight parameter omega, the bias parameter a and the output layer weight parameter beta determined by the method in the steps (1), (2) and (3), the following three steps are sequentially executed every time a new picture sample is input:

step (4-1) hidden layer weight parameter omega is updated

Firstly, a hidden layer weight parameter omega is updated by using feature function filtering, and if a hidden layer bias parameter a and an output layer weight parameter beta are not changed, the optimal estimation value of the hidden layer weight parameter omega for the (k +1) th sample is

In this step, for the update of the hidden layer weight parameter ω, ω ═ ω is used₁,ω₂,…,ω_n]，ω_i＝[ω_i1,ω_i2,…,ω_il]^TIs a component vector of the hidden layer weight parameter omega, and omega is an n-dimensional vector, the component vector needs to be corresponding to omega_iAnd i is 1, …, and n is updated by modeling solution. For each time omega_iEstimate of (c), assume ω_jJ is 1, …, i-1, i +1, …, n is constant. Then by ω_iThe following state equations and observation equations may be established for the state variables in the feature function filtering:

ω_i(k+1)＝A·ω_i(k)+w(k) (16)

wherein the model noise w (k) only needs to have a distribution function, and the measurement error v (k) only needs to have a mean value. Then the kth ω of the model_i(k) The solving process of the optimal estimated value is as follows:

(a) calculating p₁(k)

Solving the composition matrix p of the gain array U (k) as shown in equation (10)₁(k) In that respect Wherein K (t) is a weighting function,

for a given function of the target feature in question,

is the characteristic function of s (k) ═ a (k) e (k),

is the characteristic function of q (k +1) ═ G (k +1) w (k + 1).

(b) Calculating p₃(k)

y (k) classify the class for the kth picture in the sample output set,

is an estimate of y (k), and has

(c) Calculating gain array U (k)

U (k) is a weight matrix with a positive fixed matrix fixed as the hypothetical matrix R (k)

(d) Calculating the component omega of the hidden layer weight parameter to be estimated_i(k) Estimated value at time k

Step (4-2) hidden layer bias parameter a updating

And then, updating the hidden layer bias parameter a by using feature function filtering, and assuming that the hidden layer weight parameter omega and the output layer weight parameter beta are unchanged, regarding the (k +1) th picture sample, the optimal estimation value of the hidden layer bias parameter a is

In the step, a state equation and an observation equation similar to those in the step (3-1) are established by taking the hidden layer bias parameter a as a state variable, and the optimal estimated value of the kth hidden layer bias parameter a (k) is obtained by solving.

Step (4-3) updating the output layer weight parameter beta

Updating the weight parameter beta of the output layer by using a linear Kalman filtering method, and assuming that the weight parameter omega of the hidden layer and the bias parameter a of the hidden layer are both unchanged, for the (k +1) th sample, the optimal estimation value of the weight parameter beta of the output layer is

The modeling and parameter solving in this step are the same as in step (3).

The invention has the beneficial effects that: and a method combining the characteristic function filtering and the Kalman filtering is used for updating all parameters in the neural network. By applying the method in the image classification, every time a new image sample comes, all parameters in the neural network can be updated to adapt to the change of the image working condition without combining with an old image sample, the accuracy of the image classification is improved, and the complexity of calculation is reduced.

Drawings

FIG. 1 is a diagram of a model of a single hidden layer neural network.

FIG. 2 is a flow chart of the computational steps of the present invention.

Detailed Description

The application of the present invention to image classification is further described below with reference to fig. 2.

The method comprises the following specific steps

Step (1) establishing a sample input set x (k) ═ x₁(k),x₂(k),…,x_n(k)]^TTo output set y (k) ═ y₁(k),y₂(k),…,y_m(k)]^TThe neural network model of the relationship mapping between the samples, wherein the sample input set is the feature value of each image sample after preprocessing, the sample output set is the classification category of each corresponding image, k is the selection of the kth sample set, and x_n(k) For the nth input of the kth sample, y_m(k) Is the mth output of the kth sample. Taking a single hidden layer neural network as an example, see fig. 1:

The loss function of which has the general form of

Wherein,

and establishing a picture classification result for the model.

The specific form of the loss function used herein is

Step (2) initializing weight parameters omega, bias parameters a and output layer weight parameters beta of hidden layers in the network

In each iteration, under the condition that the hidden layer weight parameter omega and the bias parameter a are randomly given, all parameter solving problems of the network are converted into the problem of solving the output layer weight parameter beta through least squares. The algorithm is described in detail as follows:

when the activation function of the hidden layer is infinitely differentiable, the neural network does not need to solve all parameters any more, the hidden layer weight parameter ω and the hidden layer bias parameter a can be realized in a random selection mode and are kept unchanged in the whole process, and at the moment, if the description of the model in the formula (1) is changed into the following form:

y(k)＝H(k)β (3)

wherein

H(k)＝[H₁(k) H₂(k) … H_l(k)]^T

Then since the hidden layer weight parameter ω and the hidden layer bias parameter a are determined so that h (k) is known, the problem at this time can be transformed into how to solve the output layer weight parameter β by equation (3), and the objective function is also transformed into the following form by equation (2):

here, a least squares method is used, then the solution is

β(k+1)＝A(k+1,k)β(k)+w(k) (4)

From equation (3), the measurement equation can be obtained as follows:

y(k)＝Hβ(k)+v(k) (5)

where v (k) is also a white noise sequence, similar to the equation of state.

wherein,

And (4) updating the hidden layer weight parameter omega and the hidden layer bias parameter a through feature function filtering.

for the single hidden layer neural network with the hidden layer weight parameter omega, the bias parameter a and the output layer weight parameter beta determined by the method in the step (1), the step (2) and the step (3), when a new picture sample is input, the following three steps are sequentially carried out:

step (4-1) hidden layer weight parameter omega is updated

ω_i(k+1)＝A·ω_i(k)+w(k) (16)

(a) calculating p₁(k)

for a given function of the target feature in question,

is the characteristic function of s (k) ═ a (k) e (k),

is the characteristic function of q (k +1) ═ G (k +1) w (k + 1).

(b) Calculating p₃(k)

y (k) classify the class for the kth picture in the sample output set,

is an estimate of y (k), and has

(c) Calculating gain array U (k)

Step (4-2) hidden layer bias parameter a updating

Step (4-3) updating the output layer weight parameter beta

The modeling and parameter solving in this step are the same as in step (3).

By applying the method in the image classification, every time a new image sample comes, all parameters in the neural network can be updated to adapt to the change of the image working condition without combining with an old image sample, the accuracy of the image classification is improved, and the complexity of calculation is reduced.

Claims

1. The image classification method based on the neural network parameter updating of the characteristic function filtering is characterized by comprising the following steps:

step (1) establishing a sample input set x (k) ═ x₁(k),x₂(k),…,x_n(k)]^TTo output set y (k) ═ y₁(k),y₂(k),…,y_m(k)]^TThe neural network model of the relationship mapping between the samples, wherein the sample input set is the feature value of each image sample after preprocessing, the sample output set is the classification category of each corresponding image, k is the selection of the kth sample set, and x_n(k) For the nth input of the kth sample, y_m(k) The mth output for the kth sample;

step (2) initializing weight parameters, bias parameters and output layer weight parameters of a hidden layer in a network

In each iteration, under the condition that hidden layer weight parameters and bias parameters are randomly given, all parameter solving problems of the network are converted into a problem of solving output layer weight parameters beta through least squares;

step (3) inputting and updating an output layer weight parameter beta of the neural network through a current new image sample;

considering that the weight parameter beta of the output layer to be estimated is subjected to a certain random interference and is slowly changed, the state equation of Kalman filtering is modeled as follows:

β(k+1)＝A(k+1,k)β(k)+w(k) (4)

in order to simulate the interference on the parameter to be estimated, a white noise sequence w (k) is added into the equation;

the measurement equation is obtained as follows:

y(k)＝Hβ(k)+v(k) (5)

wherein v (k) is a white noise sequence;

in the above Kalman filtering model, the process noise w (k) and the observation noise v (k) are both white noise sequences, and are constant values within the sampling interval; and with E { w (k) w ' (k) } Q and E { v (k) v ' (k) } R, a (k +1, k) ═ E, when w (k) and v (k) are mutually independent, E { w (k) v ' (k) } 0, β (k) is the kth output layer weight parameter;

wherein,

the weight parameter beta estimated value of the output layer of the (k +1) th is obtained;

step (4) updating hidden layer weight parameters and hidden layer bias parameters through feature function filtering;

in the feature function filtering, when the observation equation is nonlinear with respect to the state variable, if the following two requirements are satisfied:

{ v (k) } mean known, | E (w (k)) > Y<+∞；

The method comprises the following steps: h (-) is a known Bohr measurable and smooth nonlinear function;

then a filter of the form:

wherein A (k) is a state transition matrix,

is an estimate of the kth state quantity,

predicted value for the k +1 th observation, U (k) e R^n×lFor a gain matrix to be designed, the acquisition of u (k) is the core and key of the whole filter design;

order to

The available estimation error equation is

The performance index is

Wherein,

the weighting function K (t) is chosen to ensure J₀It is real and bounded, it is a given positive definite weight matrix, in order to constrain the gain matrix, minimize this performance index can get the filter gain matrix;

the gain matrix K (K +1) solving process is given below:

if p is₁(k) And p₂(k) Are respectively as

The performance index can be rewritten as

To obtain the gain matrix K (K +1), let

Obtaining the extreme point of the performance index

Due to the fact that

Therefore, the solution obtained by the formula (13) is the extreme point of the minimum performance index;

for the single hidden layer neural network with the determined hidden layer weight parameters, bias parameters and output layer weight parameters, when a new picture sample is input, the following three steps are sequentially executed:

step (4-1) hidden layer weight parameter updating

Assuming that the hidden layer bias parameter and the output layer weight parameter are unchanged, the optimal estimation value of the hidden layer weight parameter for the (k +1) th sample is

In this step, for the update of the hidden layer weight parameter, ω ═ ω is used₁,ω₂,…,ω_n]，ω_i＝[ω_i1,ω_i2,…,ω_il]^TIs the component of the hidden layer weight parameter, the hidden layer weight parameter is the n-dimensional vector, and then it is necessary to the omega_iI is 1, …, n is respectively updated by modeling solution; for each time omega_iEstimate of (c), assume ω_jJ is 1, …, i-1, i +1, …, n is constant; then by ω_iEstablishing the following state equation and observation equation for the state variable in the feature function filtering:

ω_i(k+1)＝A·ω_i(k)+w(k) (16)

wherein, the model noise w (k) only needs to have a distribution function, and the measurement error v (k) only needs to have a mean value;

step (4-2) hidden layer bias parameter a updating

Establishing a state equation and an observation equation similar to those in the step (3-1) by taking the hidden layer bias parameter a as a state variable, and solving to obtain an optimal estimation value of the kth hidden layer bias parameter a (k);

step (4-3) updating the output layer weight parameter beta

Updating the weight parameters of the output layer by using a linear Kalman filtering method, and assuming that the weight parameters of the hidden layer and the bias parameters of the hidden layer are unchanged, for the (k +1) th sample, the optimal estimation value of the weight parameters of the output layer is

2. The method of claim 1, wherein: the neural network model in step (1) is represented as:

where g (-) is an activation function, ω_i＝[ω_i1,ω_i2,…,ω_il]^TA is the weight parameter and the bias parameter of the hidden layer respectively, l is the number of nodes of the single hidden layer, and beta is the weight parameter of the output layer;

a loss function of

Wherein,

the image classification result after the model is established is in a specific form

3. The method of claim 1, wherein: the step (2) is specifically as follows:

y(k)＝H(k)β (3)

wherein

H(k)＝[H₁(k) H₂(k) … H_l(k)]^T

Then since the hidden layer weight parameters and hidden layer bias parameters are determined so that h (k) is known, the problem at this point translates to solving the output layer weight parameter β by equation (3), and the objective function is also transformed by equation (2) to the form:

here, a least squares method is used, then the solution is

4. The method of claim 1, wherein: the kth ω in the model composed of the formula (16) and the formula (17) in the step (4)_i(k) The solving process of the optimal estimated value is as follows:

(a) calculating p₁(k)

Solving the composition matrix p of the gain array U (k) as shown in equation (10)₁(k) (ii) a Wherein K (t) is a weighting function,

for a given function of the target feature in question,

is the characteristic function of s (k) ═ a (k) e (k),

a characteristic function of q (k +1) ═ G (k +1) w (k + 1);

(b) calculating p₃(k)

y (k) classify the class for the kth picture in the sample output set,

is an estimate of y (k), and has

(c) Calculating gain array U (k)