CN107507611B

CN107507611B - Voice classification recognition method and device

Info

Publication number: CN107507611B
Application number: CN201710774048.1A
Authority: CN
Inventors: 张莉; 徐志强; 王邦军; 张召; 李凡长
Original assignee: Suzhou University
Current assignee: Suzhou University
Priority date: 2017-08-31
Filing date: 2017-08-31
Publication date: 2021-08-24
Anticipated expiration: 2037-08-31
Also published as: CN107507611A

Abstract

The invention discloses a method for speech classification recognition, which comprises the steps of inputting a speech data sample to be distinguished into a pre-established classifier model, and obtaining a classification result of the speech data sample according to an output value of the classifier model; the classifier model is based on the L1 paradigm canonical parameter and Laplace canonical parameter determination and the constraint conditions of the support vector machine to obtain a support vector sample set, so that the obtained classifier model has strong sparsity and interpretability, strong filtering capacity on noise and strong robustness on noise, and a more accurate result on the speech classification can be obtained. The invention also provides a device for speech classification and recognition, which has the beneficial effects.

Description

Voice classification recognition method and device

Technical Field

The invention relates to the field of artificial intelligence application, in particular to a method and a device for speech classification and recognition.

Background

With the development of artificial intelligence, computer technology is widely applied to various fields, wherein speech recognition is one of the directions with application values, and conventional processing technologies for speech and language are quite complex, so that certain burden is brought to the operation of a computer.

At present, a relatively simple voice processing technology adopts a semi-supervised algorithm to create a generating model, and the created generating model is used for processing voice, but the creating process of the currently obtained generating model for processing voice is complex, the filtering capability for noise is not very strong, and stronger robustness is lacked.

Disclosure of Invention

The invention aims to provide a method for speech classification recognition, which solves the problem of low noise filtering capability of a speech processing generation model and improves the accuracy of speech classification recognition.

Another object of the present invention is to provide an apparatus for speech classification recognition.

In order to solve the above technical problem, the present invention provides a method for speech classification and recognition, comprising:

inputting a voice data sample to be distinguished into a pre-established classifier model, and obtaining a classification result of the voice data sample according to an output value of the classifier model; wherein the classifier model creating process is as follows:

inputting a training sample set of voice data into a Laplace support vector machine; obtaining an optimal L1 normal form regular parameter gamma according to the training sample set, the positive definite parameter and the kernel function_AAnd an optimal Laplace regularization parameter γ_IAnd a kernel function matrix; according to the optimal L1 paradigm regularization parameter gamma_AThe optimal Laplace regularization parameter gamma_IObtaining a support vector sample set and an offset by the kernel function and the constraint condition of the support vector machine; and obtaining a classifier model according to the support vector sample set and the offset.

Wherein the training sample set of the input speech data into the laplacian support vector machine comprises:

inputting a training sample set into a Laplace support vector machine:

wherein x_i∈R^D，y_iIs x_iA label of (a) indicates x_iWhen y is_iWhen the E { -1, +1},

is the number of labeled training samples, when y_iWhen the content is equal to 0, the content,

u is the number of unlabeled training samples and D is the dimension of the original space.

Obtaining an optimal L1 normal form canonical parameter gamma according to the training sample set, the positive definite parameter and the kernel function_AAnd lapalalSpline parameter gamma_IAnd the kernel function matrix comprises:

dividing the training sample set into a plurality of parts, and comparing the divided training sample set with an L1 paradigm canonical parameter gamma_AAnd laplacian regularization parameter gamma_ITesting and training in a cross validation mode to obtain an optimal L1 paradigm canonical parameter gamma_AAnd an optimal Laplace regularization parameter γ_I；

Mapping the training sample set to a kernel Hilbert space through a kernel function to obtain a kernel function matrix K, wherein K is_ij＝k(x_i,x_j)。

Wherein the regularization parameter γ according to the optimal L1 paradigm_AThe optimal Laplace regularization parameter gamma_IThe obtaining of the sample set and the offset of the support vector by the kernel function and the constraint condition of the support vector machine comprises:

in that

Under the conditions of (1), solving

Obtaining the coefficient a of the discriminant model as a⁺-a^-＝[α₁,α₂,...,α_l+u]^TAnd offset b ═ beta⁺-β^-Wherein, in the step (A),

delta is a constant coefficient, xi_iIn order to be a function of the relaxation variable,

L-D-W is a laplacian matrix,

is a preset parameter, and D_ii＝∑_jW_ij；

According to the coefficient a of the discriminant model, obtaining a support vector sample set SVs ═ x in a training sample set_i|α_i≠0,i＝1,…,N}。

Wherein the obtaining a classifier model according to the support vector sample set and the offset comprises:

determining the classifier model according to the support vector sample set and the offset:

wherein x is a voice data sample to be judged, wherein x belongs to R^D，x_svIs a support vector, a_svIs the model coefficient of the support vector, and the value of y is the discrimination result of the voice data sample x.

The invention also provides a device for speech classification recognition, which comprises:

the classifier module is used for inputting a voice data sample to be distinguished into a pre-established classifier model and obtaining a classification result of the voice data sample according to an output value of the classifier model; the classifier model is created and obtained by a classifier creating module, and the classifier creating module is used for:

Wherein the classifier creation module comprises:

an input unit, configured to input a training sample set into a laplacian support vector machine:

Wherein the classifier creation module comprises:

a parameter processing unit for dividing the training sample set into several parts and comparing the divided training sample set with an L1 normal form canonical parameter gamma_AAnd laplacian regularization parameter gamma_ITesting and training in a cross validation mode to obtain an optimal L1 paradigm canonical parameter gamma_AAnd laplacian regularization parameter gamma_I(ii) a Mapping the training sample set to a kernel Hilbert space through a kernel function to obtain a kernel function matrix K, wherein K is_ij＝k(x_i,x_j)。

Wherein the classifier creation module comprises:

an arithmetic unit for

Under the conditions of (1), solving

L-D-W is a laplacian matrix,

is a preset parameter, and D_ii＝∑_jW_ij(ii) a According to the coefficient a of the discriminant model, obtaining a support vector sample set SVs ═ x in a training sample set_i|α_i≠0,i＝1,…,N}。

Wherein the classifier creation module comprises:

an obtaining unit, configured to determine, according to the support vector sample set and the offset, the classifier model:

The invention provides a speech classification recognition method, which obtains a discrimination result by inputting a speech data sample to be discriminated into a pre-established classifier model, wherein the classifier model adopts L1 paradigm canonical parameter and Laplace canonical parameter determination and constraint conditions of a support vector machine to obtain a support vector sample set, so that the obtained classifier model, the L1 paradigm canonical parameter and the Laplace canonical parameter enable the classifier model to have sparsity and interpretability, in practical application, a needed model can be obtained by adopting few sample points, which further enhances the noise filtering capability of the classifier, thereby obtaining good robustness, compared with the prior art, the classifier model adopted by the invention has low complexity and stronger interpretability and sparsity, on the basis of improving the recognition rate of the classifier model, and the method also has stronger robustness to noise, so that the result of voice classification is more accurate.

The invention also provides a device for speech classification and recognition, which has the beneficial effects.

Drawings

In order to more clearly illustrate the embodiments or technical solutions of the present invention, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.

FIG. 1 is a flow diagram of one embodiment of speech classification recognition provided by the present invention;

fig. 2 is a block diagram of a speech classification recognition apparatus according to an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the disclosure, the invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 shows a flowchart of a specific embodiment of speech classification recognition, and the method may include:

step S101: a training sample set of speech data is input into a laplacian support vector machine.

Step S102: obtaining an optimal L1 normal form regular parameter gamma according to the training sample set, the positive definite parameter and the kernel function_AAnd an optimal Laplace regularization parameter γ_IAnd a kernel function matrix.

Step S103: according to the optimal L1 paradigm regularization parameter gamma_AThe optimal Laplace regularization parameter gamma_IAnd the kernel function and the constraint condition of the support vector machine are used for obtaining a support vector sample set and an offset.

Step S104: and obtaining a classifier model according to the support vector sample set and the offset.

Step S105: and inputting the voice data sample to be distinguished into the classifier model, and obtaining a classification result of the voice data sample according to the output value of the classifier model.

It should be noted that, in this embodiment, steps S101 to S104 are all processes of creating a classifier model, and the classifier in the present invention may be created in advance according to steps S101 to S104, and only the speech data sample to be determined needs to be input according to step S105, so that the classification result can be obtained.

In addition, the generation model in the prior art needs to take the specific content of the recognized speech and needs to recognize the speech content including noise, thereby resulting in poor robustness, while the classifier model adopted in the present invention is used to recognize whether the content of the speech is what we want without paying attention to the specific content of the speech.

As a simple example, in creating the classifier model, 150 project person recordings were chosen, each reading the alphabet twice for 52 samples, with each sample dimension 617. And (3) dividing the data set aiming at voice content discrimination: the pronunciation of the first 120 persons for the letter a and the letter b is extracted to obtain 480 x 617 of data set, the letter a voice data is divided into positive class, the word b voice data is divided into negative class, 80% is drawn as training set, the rest is used as testing set. In the training set part, ten percent of data are divided into data with labels, and the rest are used as non-label data. And training by using a training set to obtain a classifier model, and inputting a test set to obtain the accuracy of the classifier model based on the training sample.

In the practical application process, the speech of a and b is selected as a training sample to obtain a classifier model capable of distinguishing the speech a and the speech b, and when new speech is input, the specific content of the speech is not concerned as long as whether the speech is a or b is judged.

The classifier model adopted in the speech classification recognition greatly reduces the technical difficulty and saves the cost when the requirements are met in certain scenes. The simplest application is that when the user answers questions with voice, for some questions with fixed answers, the voice data collection can be preprocessed, and then the model is used for judging to obtain results. For example, judging the question, training the learner by using 'right and wrong' voices in advance to obtain a corresponding model, preprocessing the collected voice data, then using the model to obtain a result, and further comparing the result output with the original question answer.

In summary, the classifier model adopted in the invention uses the L1 paradigm regularization, so that the objective function for creating the classifier model has sparsity and interpretability, that is, the required classifier model can be obtained by using few training sample points, thereby well eliminating noise data and enabling the classifier model to have good robustness.

Based on the above embodiments, another specific embodiment of the present invention may include:

the training sample set for inputting the voice data into the laplacian support vector machine specifically comprises:

inputting a training sample set into a Laplace support vector machine:

obtaining an optimal L1 normal form regular parameter gamma according to the training sample set, the positive definite parameter and the kernel function_AAnd laplacian regularization parameter gamma_IAnd the kernel function matrix is specifically:

dividing the training sample set into a plurality of parts, and comparing the divided training sample set with an L1 paradigm canonical parameter gamma_AAnd laplacian regularization parameter gamma_ITesting and training in a cross validation mode to obtain an optimal L1 paradigm canonical parameter gamma_AAnd an optimal Laplace regularization parameter γ_I(ii) a Mapping the training sample set to a kernel Hilbert space through a kernel function to obtain a kernel function matrix K, wherein K is_ij＝k(x_i,x_j)。

Specifically, the cross validation method is a k-fold cross validation method, and the L1 paradigm canonical parameter and the Laplace canonical parameter are obtained through cross validation.

For example, the training set is divided into five equal parts, one part is selected for testing, the other parts are used for training, five accuracy rates are obtained for averaging, the obtained average accuracy rate is the accuracy rate corresponding to the L1 paradigm canonical parameter and the Laplace canonical parameter, and finally the L1 paradigm canonical parameter and the Laplace canonical parameter corresponding to the maximum accuracy rate are selected as final parameters.

the regularization parameter gamma according to the optimal L1 paradigm_AThe optimal Laplace regularization parameter gamma_IThe obtaining of the support vector sample set and the offset by the kernel function and the constraint condition of the support vector machine is specifically as follows:

in that

Under the conditions of (1), solving

L-D-W is a laplacian matrix,

is a preset parameter, and D_ii＝∑_jW_ij；

Specifically, a small normal number is taken as δ to ensure that a unique solution is obtained, and the constant coefficient is obtained.

the obtaining a classifier model according to the support vector sample set and the offset specifically includes:

The following describes a speech classification recognition apparatus according to an embodiment of the present invention, and the speech classification recognition apparatus described below and the speech classification recognition method described above may be referred to correspondingly.

Fig. 2 is a block diagram of a speech classification recognition apparatus according to an embodiment of the present invention, where the speech classification recognition apparatus according to fig. 2 may include:

the classifier module 100 is configured to input a speech data sample to be distinguished into a pre-created classifier model, and obtain a classification result of the speech data sample according to an output value of the classifier model; wherein the classifier model is created by the classifier creation module 200, and the classifier creation module 200 is configured to:

Optionally, the classifier creating module 200 includes:

Optionally, the classifier creating module 200 includes:

parameter processing unit, usingDividing the training sample set into several parts, and comparing the divided training sample set with an L1 paradigm canonical parameter gamma_AAnd laplacian regularization parameter gamma_ITesting and training in a cross validation mode to obtain an optimal L1 paradigm canonical parameter gamma_AAnd laplacian regularization parameter gamma_I(ii) a Mapping the training sample set to a kernel Hilbert space through a kernel function to obtain a kernel function matrix K, wherein K is_ij＝k(x_i,x_j)。

Optionally, the classifier creating module 200 includes:

an arithmetic unit for

Under the conditions of (1), solving

L-D-W is a laplacian matrix,

Optionally, the classifier creating module 200 includes:

The device for classifying and recognizing speech of this embodiment is used to implement the foregoing method for classifying and recognizing speech, and therefore the specific implementation manner of the device for classifying and recognizing speech may be found in the foregoing embodiment portions of the method for classifying and recognizing speech, for example, the classifier module 100, the step S105 of the method for implementing the foregoing method for classifying and recognizing speech, and the classifier creation module 200, the steps S101, S102, S103, and S104 of the method for implementing the foregoing method for classifying and recognizing speech, so that the specific implementation manner thereof may refer to the description of the corresponding embodiments of each portion, and will not be described herein again.

The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The method and apparatus for speech classification recognition provided by the present invention are described in detail above. The principles and embodiments of the present invention are explained herein using specific examples, which are presented only to assist in understanding the method and its core concepts. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.

Claims

1. A method of speech classification recognition, comprising:

inputting a voice data sample to be distinguished into a pre-established classifier model, and obtaining a classification result of the voice data sample according to an output value of the classifier model;

wherein the classifier model creating process is as follows:

inputting a training sample set of voice data into a Laplace support vector machine;

obtaining an optimal L1 normal form regular parameter gamma according to the training sample set, the positive definite parameter and the kernel function_AAnd an optimal Laplace regularization parameter γ_IAnd a kernel function matrix;

according to the optimal L1 paradigm regularization parameter gamma_AThe optimal Laplace regularization parameter gamma_IObtaining a support vector sample set and an offset by the kernel function and the constraint condition of the support vector machine;

obtaining the classifier model according to the support vector sample set and the offset;

wherein, x is a voice data sample to be judged, and x belongs to R^D，x_svIs a support vector, a_svIs the model coefficient of the support vector, and the value of y is the discrimination result of the voice data sample x.

2. The method of claim 1, wherein the inputting the training sample set of speech data into the laplacian support vector machine comprises:

inputting a training sample set into a Laplace support vector machine:

3. The method according to claim 2, wherein the optimal L1 normal form canonical parameter γ is obtained according to the training sample set and the positive definite parameter and the kernel function_AAnd laplacian regularization parameter gamma_IAnd the kernel function matrix comprises:

dividing the training sample set into several parts byDivided training sample set pair L1 normal form regular parameter gamma_AAnd laplacian regularization parameter gamma_ITesting and training in a cross validation mode to obtain an optimal L1 paradigm canonical parameter gamma_AAnd an optimal Laplace regularization parameter γ_I；

4. The method according to claim 3, wherein the regularization parameter γ according to the optimal L1 paradigm_AThe optimal Laplace regularization parameter gamma_IThe obtaining of the sample set and the offset of the support vector by the kernel function and the constraint condition of the support vector machine comprises:

in that

Under the conditions of (1), solving

Obtaining the coefficient a of the discriminant model as a⁺-a^-＝[α₁,α₂,...,α_l+u]^TAnd offset b ═ beta⁺-β^-In which ξ_i≥0,

L-D-W is a laplacian matrix,

t > 0 is a preset parameter,and D_ii＝∑_jW_ij；

5. An apparatus for speech classification recognition, comprising:

the classifier module is used for inputting a voice data sample to be distinguished into a pre-established classifier model and obtaining a classification result of the voice data sample according to an output value of the classifier model;

the classifier model is created and obtained by a classifier creating module, and the classifier creating module is used for:

obtaining a classifier model according to the support vector sample set and the offset;

wherein the classifier creation module comprises: an obtaining unit, configured to determine, according to the support vector sample set and the offset, the classifier model:

6. The apparatus of claim 5, wherein the classifier creation module comprises:

7. The apparatus of claim 6, wherein the classifier creation module comprises:

8. The apparatus of claim 7, wherein the classifier creation module comprises:

an arithmetic unit for

Under the conditions of (1), solving

L-D-W is a laplacian matrix,

t > 0 is a predetermined parameter, and D_ii＝∑_jW_ij(ii) a According to the coefficient a of the discriminant model, obtaining a support vector sample set SVs ═ x in a training sample set_i|α_i≠0,i＝1,…,N}。