CN112542161B

CN112542161B - BP neural network voice recognition method based on double-layer PID optimization

Info

Publication number: CN112542161B
Application number: CN202011455918.7A
Authority: CN
Inventors: 和思铭; 李伟觐; 曾文钰; 范晨奥; 刘世新; 汪雨琦; 吴英然
Original assignee: Changchun Institute of Applied Chemistry of CAS
Current assignee: Changchun Institute of Applied Chemistry of CAS
Priority date: 2020-12-10
Filing date: 2020-12-10
Publication date: 2022-08-12
Anticipated expiration: 2040-12-10
Also published as: CN112542161A

Abstract

The invention relates to a BP neural network voice recognition method based on double-layer PID optimization, which takes an FPGA as a platform of a voice signal input method, adjusts a weight threshold value and a learning rate by using a PID algorithm, and adopts K in the double-layer PID algorithm _P ，K _I And K _D Error E of three parameters according to system and training result _g (k) The automatic adjustment is carried out, so that the weight threshold convergence of the hidden layer and the output layer is more stable, and the data fluctuation of the system is reduced; the outer PID algorithm synchronizes the updating of the learning rate with the training process of the neural network, provides a larger updating intensity in the early stage of the training of the neural network to enable the neural network to iterate quickly, reduces the updating intensity in the later stage of the training of the neural network to prevent the data from deviating from the correct value, enables the learning rate to provide a larger updating intensity in the early stage to help the system to update quickly, reduces the updating intensity in the later stage to prevent the data from deviating from the correct value, and has higher voice recognition correct rate.

Description

BP neural network voice recognition method based on double-layer PID optimization

The technical field is as follows:

the invention relates to an artificial intelligence algorithm, in particular to a BP neural network speech recognizer optimized by double-layer PID

Background art:

as an important application direction in the field of artificial intelligence, speech recognition has been studied by many learners, such as speech recognition under deep learning, and a speech recognition method under a support vector machine. In the aspect of learning algorithm, the BP algorithm has been named after a strong nonlinear mapping and a simple structure, and has been used for a long time in the context of speech recognition, but it also has some defects, such as that a weight threshold and a learning rate cannot be determined during network initialization, convergence fluctuation is large if the setting is too large, and convergence speed is slow if the setting is small. The existing weight threshold updating formula almost depends on a negative gradient algorithm, although the negative gradient algorithm accelerates the convergence of the weight threshold to a certain degree, the numerical fluctuation caused by the negative gradient algorithm is too large, and the normal convergence of the weight threshold is influenced to a certain degree. Meanwhile, the updating of the learning rate is basically manually adjusted and set in continuous experiments, the existing variable learning rate algorithm only simply and linearly reduces the learning rate, the influence of the learning rate is reduced when the algorithm is carried out to the later stage, the output deviation from a correct result due to fluctuation caused by overlarge updating force is avoided, but the effect of the algorithm on the correct rate is not achieved, and the condition of low music identification correct rate can occur when a general BP neural network structure is used for identifying music data.

CN103639211A discloses a BP neural network and PID parameter optimization roll gap control method and system, and provides a method for optimizing PID structure by using a neural network, wherein PID parameters are more stable by the method, and the algorithm is more stable.

CN110488600A discloses an LQR optimization type brushless DC motor speed regulation neural network PID controller, and provides an algorithm for optimizing neural network regulation PID by using an LQR algorithm, so that the DC motor control is more stable.

CN104834215A discloses a BP neural network PID control algorithm optimized by variation particle swarm, and provides a BP neural network PID control algorithm optimized by variation particle swarm, and the method enables PID algorithm output to be more stable.

The invention content is as follows:

the invention aims to provide a BP neural network speech recognition method based on double-layer PID optimization aiming at the defects of the prior art.

The invention idea is as follows: and the FPGA with the voice recognition function is used as a platform of a voice signal input method. The double-layer PID is divided into an inner layer and an outer layer, the inner layer PID algorithm reduces data fluctuation caused by a negative gradient algorithm, so that the weight threshold updating process is converged smoothly, and the oscillation generated during convergence is reduced; meanwhile, the outer-layer PID algorithm synchronizes the updating of the learning rate with the training process of the neural network, provides a larger updating intensity in the early stage of the neural network training to enable the neural network to iterate rapidly, reduces the updating intensity in the later stage of the neural network training to prevent data from deviating from a correct value, and has higher voice recognition accuracy.

The purpose of the invention is realized by the following technical scheme:

firstly, three layers of BP neural networks are built, namely an input layer, a hidden layer and an output layer. Randomly generating a weighting factor W between an output layer and a hidden layer _ij (k) And W _jg (k) And an activation function parameter a between two layers _j (k) And b _g (k) Selecting a learning rate eta (k), and setting k to be 1;

secondly, the FPGA platform extracts voice data of the voice to be recognized, and the BP neural network is used for extracting the extracted data X _i (k) Analyzing and calculating the output O of the BP neural network output layer _g (k) While outputting Y as desired _g (k) Calculating error E _g (k) Using error E _g (k) Proportional parameter K in Proportion Integration Differentiation (PID) algorithm _P Integral parameter K _I Differential parameter K _D Adjusting η (k);

then, using the error E _g (k) And the adjusted eta (k) corrects the weighting coefficient W of the output layer and the hidden layer of the BP neural network _ij (k) And W _jg (k) And an activation function parameter a between two layers _j (k) And b _g (k) Until the output error of the BP neural network output layer meets the requirement, finally judging whether the input voice is a set voice signal;

the method comprises the following steps:

A. carrying out training initialization preparation, initializing a neural network structure, and acquiring a sample set for training;

B. carrying out feature extraction on the sample set to obtain a feature set;

C. training a neural network by taking the feature set as a training set, and using the expected output Y in the training process _g (k) And the actual output O _g (k) Obtain an error E _g (k) Error E _g (k) Adjusting the learning rate η (k) as a parameter in the outer PID algorithm, and then continuing to use the error E _g (k) And the adjusted learning rate eta (k +1) is taken as the weight W of the inner PID algorithm parameter to the input layer _ij (k) Threshold value a _j (k) And output layer weight W _jg (k) Threshold b _g (k) Carrying out adjustment;

D. testing the adjusted neural network, extracting the characteristics of the test sample according to the step (2), and identifying the test sample by the neural network in the step (3) to obtain the error E in the step (3) _g (k) When error E _g (k) If the value is less than a certain threshold value, the neural network training is finished, and whether the input voice signal is a set voice signal or not is judged.

In said step C, error E _g (k) And adjusting the learning rate as an outer layer PID algorithm parameter, wherein an outer layer PID adjusting formula is as follows:

where s is 1,2, …, k, k is the current iteration number, N ₃ Taking N as the number of output layers ₃ 3, wherein

E _gD (k)＝E _g (k)-E _g (k-1)。

In said step C, the error E is used _g (k) And the adjusted learning rate eta (k +1) is taken as the weight W of the inner PID algorithm parameter pair between the input layer and the hidden layer _ij (k) Hidden layer threshold a _j (k) And the weight W between the hidden layer and the output layer _jg (k) Output layer threshold b _g (k) Make a toneThe PID regulating formula of the outer layer is as follows,

the updating formula of the weight between the input layer and the hidden layer is as follows:

the hidden layer threshold is formulated as:

N ₃ taking N as the number of output layers ₃ ＝3，

The updating formula of the weight between the hidden layer and the output layer is as follows:

the update formula of the output layer threshold is as follows:

E _gD (k)＝E _g (k)-E _g (k-1)。

Has the advantages that: adjusting weight threshold and learning rate using PID algorithm, K in double-layer PID algorithm _P ，K _I And K _D Error E of three parameters according to system and training result _g (k) The automatic adjustment is carried out, so that the weight threshold convergence of the hidden layer and the output layer is more stable, and the data fluctuation of the system is reduced; the learning rate can provide larger updating intensity in the early stage to help the system to update quickly, and can reduce the updating intensity in the later stage to prevent the data from deviating from the correct value and realize voice recognitionThe accuracy is higher.

Description of the drawings:

FIG. 1 is a running diagram of a speech recognition method based on an FPGA platform

FIG. 2 is a flowchart of an algorithm for updating learning rate of weight threshold by two-level PID

FIG. 3 is a diagram of a BP neural network structure

FIG. 4 is a flow chart of neural network training

FIG. 5 is a simulation display diagram of four music characteristic signals

FIG. 6 is a graph of convergence weight threshold for training of general neural network structure

FIG. 7 is a graph of convergence of training weight threshold of neural network structure after double-layer PID optimization

FIG. 8 is a graph of the convergence of learning rate in training of general neural network structure

FIG. 9 is a neural network structure training learning rate convergence diagram after double-layer PID optimization

FIG. 10 is a graph of the accuracy of the training structure of the general neural network structure for four music characteristic signals

FIG. 11 is a diagram of the accuracy of a neural network structure training structure under the double-layer PID result optimization of four music characteristic signals.

The specific implementation mode is as follows:

the invention is described in further detail below with reference to the following figures and examples:

a BP neural network voice recognition method with double-layer PID optimization,

firstly, three layers of BP neural networks are built, namely an input layer, a hidden layer and an output layer. Randomly generating a weighting factor W between an output layer and a hidden layer _ij (k) And W _jg (k) And an activation function parameter a between two layers _j (k) And b _g (k) Selecting a learning rate eta (k), and setting k to 1;

secondly, the FPGA platform extracts voice data of the voice to be recognized, and the BP neural network is used for extracting the extracted data X _i (k) Analyzing, and calculating output O of BP neural network output layer _g (k) While outputting Y as desired _g (k) Calculating error E _g (k) Using error E _g (k) Proportional parameter K in Proportion Integration Differentiation (PID) algorithm _P Integral parameter K _I Differential parameter K _D Adjusting η (k);

then, using the error E _g (k) And the adjusted weighting coefficient W of the output layer and the hidden layer of the eta (k) correction BP neural network _ij (k) And W _jg (k) And an activation function parameter a between two layers _j (k) And b _g (k) Until the output error of the BP neural network output layer meets the requirement, finally, judging whether the input voice is a set voice signal;

the method comprises the following steps:

a: initializing an FPGA platform based on a double-layer PID optimization BP neural network, and taking three input layer neurons, six hidden layer neurons and three output layer neurons as shown in FIG. 3. Denote the ith input layer neuron input data as X _i (k) (ii) a Recording the weight value between the ith input layer neuron and the jth hidden layer neuron as W _ij (k) (ii) a Let the hidden layer threshold for the jth hidden layer neuron be a _j (k) (ii) a Recording the weight value between the jth hidden layer neuron and the g output layer neuron as W _jg (k) (ii) a Denote the g-th output layer neuron threshold as b _g (k) (ii) a Let the g output layer neuron output value be O _g (k) The expected output value and the output value error value are recorded as Y _g (k) And E _g (k) (ii) a The learning rate is noted as η (k). Wherein i is the number of neurons in the input layer, j is the number of neurons in the hidden layer, g is the number of neurons in the output layer, and k is the current iteration number. Generating proportional parameter K in inner PID and outer PID algorithm simultaneously _P Integral parameter K _I Differential parameter K _D 。

B: the FPGA platform extracts voice data from a voice signal by AD conversion, then performs feature extraction on the extracted voice data by the MFCC method to obtain a feature vector set, and records an input layer vector as z (k) ═ X ₁ (k),X ₂ (k),X ₃ (k) Wherein X) is _i For each of the three neurons in the input layer (i ═ 1,2,3), the input layer first iteration takes as input Z (1) ═ X ₁ (1),X ₂ (1),X ₃ (1) Of the first one of them)The first input of a neuron is X ₁ (1) After Z (k) is input, firstly, data weight processing is carried out on the hidden layer, and the processing result is recorded as:

variable N ₁ For inputting the number of layers, take N ₁ ＝3；

Continue to process the weight value processing result net _j Processing hidden layer threshold value data, recorded as H _j ＝f(net _j -a _j ) Wherein f is an activation function, a _j To hide the layer threshold, H _j Is the output layer input value.

Output layer input value H _j (j is 1,2,3,4,5,6), data weighting processing is performed on the output layer, and the processing result is expressed as:

variable N ₂ Taking N to hide the number of layers ₂ ＝6；

Then, the threshold processing is continuously carried out on the output layer weight processing result and is recorded as O _g ＝G(net _g -b _g ) G is an activation function, b _g As output layer threshold, O _g And outputting the output value of the output layer, namely the final output of the neural network. Error recording E _g (k)＝Y _g (k)-O _g (k) Wherein the first output neuron outputs an error E ₁ (1)＝Y ₁ (1)-O ₁ (1) Second output neuron output error E ₂ (1)＝Y ₂ (1)-O ₂ (1) Third output neuron output error E ₃ (1)＝Y ₃ (1)-O ₃ (1)。

C: carrying out reverse error propagation by the BP neural network, and updating a learning rate eta (k); hidden layer weight W _ij (k) And a threshold value a _j (k) (ii) a Output layer weight W _jg (k) And a threshold value b _g (k) In that respect The inner-layer PID updating formula of the learning rate eta (k) is as follows:

where s is 1,2, …, k, k is the current iteration number, N ₃ Taking N as the number of output layers ₃ ＝3；

Wherein

The purpose of summation and averaging in the formula is to convert a plurality of adjustment values into one, so that eta can be subjected to the change of the whole neural network, the parameter is used as a proportional adjustment parameter, eta can be rapidly adjusted, and the second iteration of the first output neuron is taken as an example:

the parameter is used as an integral adjustment parameter, error accumulation is carried out on time, a stable adjustment is maintained, inertia during parameter updating is reduced, and system oscillation is reduced. Take the second iteration of the first output neuron as an example: e ₁ (2)+E ₁ (1) (ii) a Wherein E _gD (k)＝E _g (k)-E _g (k-1), the parameter is used as a differential regulation parameter, the output error change of the neural network can be reflected in advance, and an effective early correction signal is introduced into the system, so that the action speed of the system is accelerated, and the regulation time is shortened. Take the second iteration of the first output neuron as an example: e _1D (2)＝E ₁ (2)-E ₁ (1)。

Take the inner-layer PID update formula of the second iteration of the first output neuron as an example:

the learning rate updated by using the inner PID updating algorithm can provide a larger updating intensity at the early stage of the operation of the neural network, so that the system can quickly iterate and correct parameters, the updating intensity is reduced at the later stage of the operation of the system, and the result is prevented from deviating from the correct value due to data oscillation;

after the learning rate is updated, the updated learning rate eta (k) and the output error E are used _g (k) As a parameter in an inner PID updating formula, then updating a weight between an input layer and a hidden layer, a hidden layer threshold, a weight between the hidden layer and an output layer threshold;

the formula for updating the weight between the input layer and the hidden layer is exemplified by the second iteration:

the updating formula of the hidden layer threshold value is as follows:

N ₃ taking N as the number of output layers ₃ ＝3。

The hidden layer threshold formula is exemplified by the second iteration:

the weight formula between the hidden layer and the output layer is exemplified by the second iteration:

the updating formula of the output layer threshold value is as follows:

the update formula of the output layer threshold takes the second iteration as an example:

the negative gradient algorithm only considers the current state of the neural network, but does not consider the past state and the future state, and the weight threshold value adjustment data in the iteration process has large fluctuation and is easy to deviate from correct output. And updating the weight threshold value by using an inner-layer PID algorithm, and then updating the weight threshold value according to the error return value, so that the weight threshold value can be stably iterated, and the purposes of stronger system stability and higher output result accuracy are achieved.

D: and testing the adjusted neural network algorithm, extracting the characteristics of the test sample according to the step two, then testing according to the step two, and if the error is lower than a certain threshold value, finishing the training and outputting the result.

According to the simulation experiment result, the same data is processed by using a common neural network and a neural network with a double-layer PID structure optimized, four types of characteristic signals are shown in FIG. 5, and it can be seen that the second type of music characteristic signal is very similar to the third type of music characteristic signal, so that the second type of music characteristic signal or the third type of music characteristic signal may not be ideal in final identification accuracy.

The weight threshold of the hidden layer and the weight threshold of the output layer of the common neural network are shown in fig. 6, the convergence of the neural network with a double-layer PID structure is shown in fig. 7, and it can be seen that the output convergence of the weight threshold optimized by using the double-layer PID is more stable.

The learning rate after the training of the common neural network structure is shown in fig. 8, and the learning rate of the neural network training under the optimization of the double-layer PID structure is shown in fig. 9, so that the obvious convergence can be seen, and compared with the fixed learning rate, the learning rate is obviously more scientific by continuously converging according to the feedback result.

The operation result of the neural network with the common structure is shown in fig. 10, and the output result of the neural network with the optimized double-layer PID structure is shown in fig. 11, so that the accuracy of the neural network with the optimized double-layer PID structure for the third music identification is far higher than that of the neural network with the common structure, and meanwhile, the neural network with the optimized double-layer PID structure has almost no influence on the other three types of music identification, and the overall accuracy is greatly improved.

Claims

1. A BP neural network speech recognition method of double-deck PID optimization, characterized by that:

firstly, building three layers of BP neural networks which are respectively an input layer, a hidden layer and an output layer, and randomly generating a weighting coefficient W between the output layer and the hidden layer _ij (k) And W _jg (k) And an activation function parameter a between two layers _j (k) And b _g (k) Selecting a learning rate eta (k), and setting k to be 1;

secondly, the FPGA platform extracts voice data of the voice to be recognized, and the BP neural network is used for extracting the extracted data X _i (k) Analyzing and calculating the output O of the BP neural network output layer _g (k) While outputting Y as desired _g (k) Calculating error E _g (k) Using error E _g (k) With a proportional parameter K _P Integral parameter K _I Differential parameter K _D Carrying out PID regulation on eta (k);

after the learning rate is updated, the updated learning rate eta (k) and the output error E are used _g (k) As a parameter in an inner PID updating formula, then updating a weight between an input layer and a hidden layer, a hidden layer threshold, a weight between the hidden layer and an output layer threshold until an output error of a BP neural network output layer meets requirements, and finally judging whether the input voice is a set voice signal;

the method comprises the following steps:

B. carrying out feature extraction on the sample set to obtain a feature set;

C. training a neural network by taking the feature set as a training set, and using the expected output Y in the training process _g (k) And the actual output O _g (k) Obtain an error E _g (k) Error E _g (k) Adjusting the learning rate η (k) as a parameter in the outer PID algorithm, and then continuing to use the error E _g (k) And the adjusted learning rate eta (k +1) is taken as the weight W of the inner PID algorithm parameter to the input layer _ij (k) Threshold value a _j (k) And the output layer weight W _jg (k) Threshold b _g (k) Carrying out adjustment;

2. The method according to claim 1, wherein in the step C, the error E is determined by the method _g (k) And adjusting the learning rate as an outer layer PID algorithm parameter, wherein an outer layer PID adjusting formula is as follows:

E _gD (k)＝E _g (k)-E _g (k-1)。

3. The method according to claim 1, wherein in step C, an error E is used _g (k) And the adjusted learning rate eta (k +1) is taken as the weight W of the inner PID algorithm parameter pair between the input layer and the hidden layer _ij (k) Hidden layer threshold a _j (k) And the weight W between the hidden layer and the output layer _jg (k) Output layer threshold b _g (k) The regulation is carried out, the outer layer PID regulation formula is as follows,

the hidden layer threshold is formulated as:

N ₃ taking N as the number of output layers ₃ ＝3，

the updating formula of the output layer threshold value is as follows:

where s is 1,2, …, k, k is the current iteration number, N ₃ Taking N as the number of output layers ₃ 3, wherein E _gD (k)＝E _g (k)-E _g (k-1)。

EgD(k)＝Eg(k)-Eg(k-1)。