CN107507611B - Voice classification recognition method and device - Google Patents
Voice classification recognition method and device Download PDFInfo
- Publication number
- CN107507611B CN107507611B CN201710774048.1A CN201710774048A CN107507611B CN 107507611 B CN107507611 B CN 107507611B CN 201710774048 A CN201710774048 A CN 201710774048A CN 107507611 B CN107507611 B CN 107507611B
- Authority
- CN
- China
- Prior art keywords
- sample set
- support vector
- obtaining
- optimal
- classifier
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000012706 support-vector machine Methods 0.000 claims abstract description 26
- 238000012549 training Methods 0.000 claims description 76
- 239000011159 matrix material Substances 0.000 claims description 21
- 238000002790 cross-validation Methods 0.000 claims description 9
- 238000012360 testing method Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 8
- 238000013507 mapping Methods 0.000 claims description 6
- 238000001914 filtration Methods 0.000 abstract description 4
- 230000009286 beneficial effect Effects 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 25
- 238000010586 diagram Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/083—Recognition networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0631—Creating reference templates; Clustering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/085—Methods for reducing search complexity, pruning
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method for speech classification recognition, which comprises the steps of inputting a speech data sample to be distinguished into a pre-established classifier model, and obtaining a classification result of the speech data sample according to an output value of the classifier model; the classifier model is based on the L1 paradigm canonical parameter and Laplace canonical parameter determination and the constraint conditions of the support vector machine to obtain a support vector sample set, so that the obtained classifier model has strong sparsity and interpretability, strong filtering capacity on noise and strong robustness on noise, and a more accurate result on the speech classification can be obtained. The invention also provides a device for speech classification and recognition, which has the beneficial effects.
Description
Technical Field
The invention relates to the field of artificial intelligence application, in particular to a method and a device for speech classification and recognition.
Background
With the development of artificial intelligence, computer technology is widely applied to various fields, wherein speech recognition is one of the directions with application values, and conventional processing technologies for speech and language are quite complex, so that certain burden is brought to the operation of a computer.
At present, a relatively simple voice processing technology adopts a semi-supervised algorithm to create a generating model, and the created generating model is used for processing voice, but the creating process of the currently obtained generating model for processing voice is complex, the filtering capability for noise is not very strong, and stronger robustness is lacked.
Disclosure of Invention
The invention aims to provide a method for speech classification recognition, which solves the problem of low noise filtering capability of a speech processing generation model and improves the accuracy of speech classification recognition.
Another object of the present invention is to provide an apparatus for speech classification recognition.
In order to solve the above technical problem, the present invention provides a method for speech classification and recognition, comprising:
inputting a voice data sample to be distinguished into a pre-established classifier model, and obtaining a classification result of the voice data sample according to an output value of the classifier model; wherein the classifier model creating process is as follows:
inputting a training sample set of voice data into a Laplace support vector machine; obtaining an optimal L1 normal form regular parameter gamma according to the training sample set, the positive definite parameter and the kernel functionAAnd an optimal Laplace regularization parameter γIAnd a kernel function matrix; according to the optimal L1 paradigm regularization parameter gammaAThe optimal Laplace regularization parameter gammaIObtaining a support vector sample set and an offset by the kernel function and the constraint condition of the support vector machine; and obtaining a classifier model according to the support vector sample set and the offset.
Wherein the training sample set of the input speech data into the laplacian support vector machine comprises:
inputting a training sample set into a Laplace support vector machine:wherein xi∈RD,yiIs xiA label of (a) indicates xiWhen y isiWhen the E { -1, +1}, is the number of labeled training samples, when yiWhen the content is equal to 0, the content,u is the number of unlabeled training samples and D is the dimension of the original space.
Obtaining an optimal L1 normal form canonical parameter gamma according to the training sample set, the positive definite parameter and the kernel functionAAnd lapalalSpline parameter gammaIAnd the kernel function matrix comprises:
dividing the training sample set into a plurality of parts, and comparing the divided training sample set with an L1 paradigm canonical parameter gammaAAnd laplacian regularization parameter gammaITesting and training in a cross validation mode to obtain an optimal L1 paradigm canonical parameter gammaAAnd an optimal Laplace regularization parameter γI;
Mapping the training sample set to a kernel Hilbert space through a kernel function to obtain a kernel function matrix K, wherein K isij=k(xi,xj)。
Wherein the regularization parameter γ according to the optimal L1 paradigmAThe optimal Laplace regularization parameter gammaIThe obtaining of the sample set and the offset of the support vector by the kernel function and the constraint condition of the support vector machine comprises:
in thatUnder the conditions of (1), solvingObtaining the coefficient a of the discriminant model as a+-a-=[α1,α2,...,αl+u]TAnd offset b ═ beta+-β-Wherein, in the step (A),delta is a constant coefficient, xiiIn order to be a function of the relaxation variable,L-D-W is a laplacian matrix,is a preset parameter, and Dii=∑jWij;
According to the coefficient a of the discriminant model, obtaining a support vector sample set SVs ═ x in a training sample seti|αi≠0,i=1,…,N}。
Wherein the obtaining a classifier model according to the support vector sample set and the offset comprises:
determining the classifier model according to the support vector sample set and the offset:wherein x is a voice data sample to be judged, wherein x belongs to RD,xsvIs a support vector, asvIs the model coefficient of the support vector, and the value of y is the discrimination result of the voice data sample x.
The invention also provides a device for speech classification recognition, which comprises:
the classifier module is used for inputting a voice data sample to be distinguished into a pre-established classifier model and obtaining a classification result of the voice data sample according to an output value of the classifier model; the classifier model is created and obtained by a classifier creating module, and the classifier creating module is used for:
inputting a training sample set of voice data into a Laplace support vector machine; obtaining an optimal L1 normal form regular parameter gamma according to the training sample set, the positive definite parameter and the kernel functionAAnd an optimal Laplace regularization parameter γIAnd a kernel function matrix; according to the optimal L1 paradigm regularization parameter gammaAThe optimal Laplace regularization parameter gammaIObtaining a support vector sample set and an offset by the kernel function and the constraint condition of the support vector machine; and obtaining a classifier model according to the support vector sample set and the offset.
Wherein the classifier creation module comprises:
an input unit, configured to input a training sample set into a laplacian support vector machine:wherein xi∈RD,yiIs xiA label of (a) indicates xiWhen y isiWhen the E { -1, +1}, is the number of labeled training samples, when yiWhen the content is equal to 0, the content,u is the number of unlabeled training samples and D is the dimension of the original space.
Wherein the classifier creation module comprises:
a parameter processing unit for dividing the training sample set into several parts and comparing the divided training sample set with an L1 normal form canonical parameter gammaAAnd laplacian regularization parameter gammaITesting and training in a cross validation mode to obtain an optimal L1 paradigm canonical parameter gammaAAnd laplacian regularization parameter gammaI(ii) a Mapping the training sample set to a kernel Hilbert space through a kernel function to obtain a kernel function matrix K, wherein K isij=k(xi,xj)。
Wherein the classifier creation module comprises:
an arithmetic unit forUnder the conditions of (1), solvingObtaining the coefficient a of the discriminant model as a+-a-=[α1,α2,...,αl+u]TAnd offset b ═ beta+-β-Wherein, in the step (A),delta is a constant coefficient, xiiIn order to be a function of the relaxation variable,L-D-W is a laplacian matrix,is a preset parameter, and Dii=∑jWij(ii) a According to the coefficient a of the discriminant model, obtaining a support vector sample set SVs ═ x in a training sample seti|αi≠0,i=1,…,N}。
Wherein the classifier creation module comprises:
an obtaining unit, configured to determine, according to the support vector sample set and the offset, the classifier model:wherein x is a voice data sample to be judged, wherein x belongs to RD,xsvIs a support vector, asvIs the model coefficient of the support vector, and the value of y is the discrimination result of the voice data sample x.
The invention provides a speech classification recognition method, which obtains a discrimination result by inputting a speech data sample to be discriminated into a pre-established classifier model, wherein the classifier model adopts L1 paradigm canonical parameter and Laplace canonical parameter determination and constraint conditions of a support vector machine to obtain a support vector sample set, so that the obtained classifier model, the L1 paradigm canonical parameter and the Laplace canonical parameter enable the classifier model to have sparsity and interpretability, in practical application, a needed model can be obtained by adopting few sample points, which further enhances the noise filtering capability of the classifier, thereby obtaining good robustness, compared with the prior art, the classifier model adopted by the invention has low complexity and stronger interpretability and sparsity, on the basis of improving the recognition rate of the classifier model, and the method also has stronger robustness to noise, so that the result of voice classification is more accurate.
The invention also provides a device for speech classification and recognition, which has the beneficial effects.
Drawings
In order to more clearly illustrate the embodiments or technical solutions of the present invention, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
FIG. 1 is a flow diagram of one embodiment of speech classification recognition provided by the present invention;
fig. 2 is a block diagram of a speech classification recognition apparatus according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the disclosure, the invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 shows a flowchart of a specific embodiment of speech classification recognition, and the method may include:
step S101: a training sample set of speech data is input into a laplacian support vector machine.
Step S102: obtaining an optimal L1 normal form regular parameter gamma according to the training sample set, the positive definite parameter and the kernel functionAAnd an optimal Laplace regularization parameter γIAnd a kernel function matrix.
Step S103: according to the optimal L1 paradigm regularization parameter gammaAThe optimal Laplace regularization parameter gammaIAnd the kernel function and the constraint condition of the support vector machine are used for obtaining a support vector sample set and an offset.
Step S104: and obtaining a classifier model according to the support vector sample set and the offset.
Step S105: and inputting the voice data sample to be distinguished into the classifier model, and obtaining a classification result of the voice data sample according to the output value of the classifier model.
It should be noted that, in this embodiment, steps S101 to S104 are all processes of creating a classifier model, and the classifier in the present invention may be created in advance according to steps S101 to S104, and only the speech data sample to be determined needs to be input according to step S105, so that the classification result can be obtained.
In addition, the generation model in the prior art needs to take the specific content of the recognized speech and needs to recognize the speech content including noise, thereby resulting in poor robustness, while the classifier model adopted in the present invention is used to recognize whether the content of the speech is what we want without paying attention to the specific content of the speech.
As a simple example, in creating the classifier model, 150 project person recordings were chosen, each reading the alphabet twice for 52 samples, with each sample dimension 617. And (3) dividing the data set aiming at voice content discrimination: the pronunciation of the first 120 persons for the letter a and the letter b is extracted to obtain 480 x 617 of data set, the letter a voice data is divided into positive class, the word b voice data is divided into negative class, 80% is drawn as training set, the rest is used as testing set. In the training set part, ten percent of data are divided into data with labels, and the rest are used as non-label data. And training by using a training set to obtain a classifier model, and inputting a test set to obtain the accuracy of the classifier model based on the training sample.
In the practical application process, the speech of a and b is selected as a training sample to obtain a classifier model capable of distinguishing the speech a and the speech b, and when new speech is input, the specific content of the speech is not concerned as long as whether the speech is a or b is judged.
The classifier model adopted in the speech classification recognition greatly reduces the technical difficulty and saves the cost when the requirements are met in certain scenes. The simplest application is that when the user answers questions with voice, for some questions with fixed answers, the voice data collection can be preprocessed, and then the model is used for judging to obtain results. For example, judging the question, training the learner by using 'right and wrong' voices in advance to obtain a corresponding model, preprocessing the collected voice data, then using the model to obtain a result, and further comparing the result output with the original question answer.
In summary, the classifier model adopted in the invention uses the L1 paradigm regularization, so that the objective function for creating the classifier model has sparsity and interpretability, that is, the required classifier model can be obtained by using few training sample points, thereby well eliminating noise data and enabling the classifier model to have good robustness.
Based on the above embodiments, another specific embodiment of the present invention may include:
the training sample set for inputting the voice data into the laplacian support vector machine specifically comprises:
inputting a training sample set into a Laplace support vector machine:wherein xi∈RD,yiIs xiA label of (a) indicates xiWhen y isiWhen the E { -1, +1}, is the number of labeled training samples, when yiWhen the content is equal to 0, the content,u is the number of unlabeled training samples and D is the dimension of the original space.
Based on the above embodiments, another specific embodiment of the present invention may include:
obtaining an optimal L1 normal form regular parameter gamma according to the training sample set, the positive definite parameter and the kernel functionAAnd laplacian regularization parameter gammaIAnd the kernel function matrix is specifically:
dividing the training sample set into a plurality of parts, and comparing the divided training sample set with an L1 paradigm canonical parameter gammaAAnd laplacian regularization parameter gammaITesting and training in a cross validation mode to obtain an optimal L1 paradigm canonical parameter gammaAAnd an optimal Laplace regularization parameter γI(ii) a Mapping the training sample set to a kernel Hilbert space through a kernel function to obtain a kernel function matrix K, wherein K isij=k(xi,xj)。
Specifically, the cross validation method is a k-fold cross validation method, and the L1 paradigm canonical parameter and the Laplace canonical parameter are obtained through cross validation.
For example, the training set is divided into five equal parts, one part is selected for testing, the other parts are used for training, five accuracy rates are obtained for averaging, the obtained average accuracy rate is the accuracy rate corresponding to the L1 paradigm canonical parameter and the Laplace canonical parameter, and finally the L1 paradigm canonical parameter and the Laplace canonical parameter corresponding to the maximum accuracy rate are selected as final parameters.
Based on the above embodiments, another specific embodiment of the present invention may include:
the regularization parameter gamma according to the optimal L1 paradigmAThe optimal Laplace regularization parameter gammaIThe obtaining of the support vector sample set and the offset by the kernel function and the constraint condition of the support vector machine is specifically as follows:
in thatUnder the conditions of (1), solvingObtaining the coefficient a of the discriminant model as a+-a-=[α1,α2,...,αl+u]TAnd offset b ═ beta+-β-Wherein, in the step (A),delta is a constant coefficient, xiiIn order to be a function of the relaxation variable,L-D-W is a laplacian matrix,is a preset parameter, and Dii=∑jWij;
According to the coefficient a of the discriminant model, obtaining a support vector sample set SVs ═ x in a training sample seti|αi≠0,i=1,…,N}。
Specifically, a small normal number is taken as δ to ensure that a unique solution is obtained, and the constant coefficient is obtained.
Based on the above embodiments, another specific embodiment of the present invention may include:
the obtaining a classifier model according to the support vector sample set and the offset specifically includes:
determining the classifier model according to the support vector sample set and the offset:wherein x is a voice data sample to be judged, wherein x belongs to RD,xsvIs a support vector, asvIs the model coefficient of the support vector, and the value of y is the discrimination result of the voice data sample x.
The following describes a speech classification recognition apparatus according to an embodiment of the present invention, and the speech classification recognition apparatus described below and the speech classification recognition method described above may be referred to correspondingly.
Fig. 2 is a block diagram of a speech classification recognition apparatus according to an embodiment of the present invention, where the speech classification recognition apparatus according to fig. 2 may include:
the classifier module 100 is configured to input a speech data sample to be distinguished into a pre-created classifier model, and obtain a classification result of the speech data sample according to an output value of the classifier model; wherein the classifier model is created by the classifier creation module 200, and the classifier creation module 200 is configured to:
inputting a training sample set of voice data into a Laplace support vector machine; obtaining an optimal L1 normal form regular parameter gamma according to the training sample set, the positive definite parameter and the kernel functionAAnd an optimal Laplace regularization parameter γIAnd a kernel function matrix; according to the optimal L1 paradigm regularization parameter gammaAThe optimal Laplace regularization parameter gammaIObtaining a support vector sample set and an offset by the kernel function and the constraint condition of the support vector machine; and obtaining a classifier model according to the support vector sample set and the offset.
Optionally, the classifier creating module 200 includes:
an input unit, configured to input a training sample set into a laplacian support vector machine:wherein xi∈RD,yiIs xiA label of (a) indicates xiWhen y isiWhen the E { -1, +1}, is the number of labeled training samples, when yiWhen the content is equal to 0, the content,u is the number of unlabeled training samples and D is the dimension of the original space.
Optionally, the classifier creating module 200 includes:
parameter processing unit, usingDividing the training sample set into several parts, and comparing the divided training sample set with an L1 paradigm canonical parameter gammaAAnd laplacian regularization parameter gammaITesting and training in a cross validation mode to obtain an optimal L1 paradigm canonical parameter gammaAAnd laplacian regularization parameter gammaI(ii) a Mapping the training sample set to a kernel Hilbert space through a kernel function to obtain a kernel function matrix K, wherein K isij=k(xi,xj)。
Optionally, the classifier creating module 200 includes:
an arithmetic unit forUnder the conditions of (1), solvingObtaining the coefficient a of the discriminant model as a+-a-=[α1,α2,...,αl+u]TAnd offset b ═ beta+-β-Wherein, in the step (A),delta is a constant coefficient, xiiIn order to be a function of the relaxation variable,L-D-W is a laplacian matrix,is a preset parameter, and Dii=∑jWij(ii) a According to the coefficient a of the discriminant model, obtaining a support vector sample set SVs ═ x in a training sample seti|αi≠0,i=1,…,N}。
Optionally, the classifier creating module 200 includes:
an obtaining unit, configured to determine, according to the support vector sample set and the offset, the classifier model:wherein x is a voice data sample to be judged, wherein x belongs to RD,xsvIs a support vector, asvIs the model coefficient of the support vector, and the value of y is the discrimination result of the voice data sample x.
The device for classifying and recognizing speech of this embodiment is used to implement the foregoing method for classifying and recognizing speech, and therefore the specific implementation manner of the device for classifying and recognizing speech may be found in the foregoing embodiment portions of the method for classifying and recognizing speech, for example, the classifier module 100, the step S105 of the method for implementing the foregoing method for classifying and recognizing speech, and the classifier creation module 200, the steps S101, S102, S103, and S104 of the method for implementing the foregoing method for classifying and recognizing speech, so that the specific implementation manner thereof may refer to the description of the corresponding embodiments of each portion, and will not be described herein again.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The method and apparatus for speech classification recognition provided by the present invention are described in detail above. The principles and embodiments of the present invention are explained herein using specific examples, which are presented only to assist in understanding the method and its core concepts. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.
Claims (8)
1. A method of speech classification recognition, comprising:
inputting a voice data sample to be distinguished into a pre-established classifier model, and obtaining a classification result of the voice data sample according to an output value of the classifier model;
wherein the classifier model creating process is as follows:
inputting a training sample set of voice data into a Laplace support vector machine;
obtaining an optimal L1 normal form regular parameter gamma according to the training sample set, the positive definite parameter and the kernel functionAAnd an optimal Laplace regularization parameter γIAnd a kernel function matrix;
according to the optimal L1 paradigm regularization parameter gammaAThe optimal Laplace regularization parameter gammaIObtaining a support vector sample set and an offset by the kernel function and the constraint condition of the support vector machine;
obtaining the classifier model according to the support vector sample set and the offset;
wherein the obtaining a classifier model according to the support vector sample set and the offset comprises:
determining the classifier model according to the support vector sample set and the offset:wherein, x is a voice data sample to be judged, and x belongs to RD,xsvIs a support vector, asvIs the model coefficient of the support vector, and the value of y is the discrimination result of the voice data sample x.
2. The method of claim 1, wherein the inputting the training sample set of speech data into the laplacian support vector machine comprises:
inputting a training sample set into a Laplace support vector machine:wherein xi∈RD,yiIs xiA label of (a) indicates xiWhen y isiWhen the E { -1, +1}, is the number of labeled training samples, when yiWhen the content is equal to 0, the content,u is the number of unlabeled training samples and D is the dimension of the original space.
3. The method according to claim 2, wherein the optimal L1 normal form canonical parameter γ is obtained according to the training sample set and the positive definite parameter and the kernel functionAAnd laplacian regularization parameter gammaIAnd the kernel function matrix comprises:
dividing the training sample set into several parts byDivided training sample set pair L1 normal form regular parameter gammaAAnd laplacian regularization parameter gammaITesting and training in a cross validation mode to obtain an optimal L1 paradigm canonical parameter gammaAAnd an optimal Laplace regularization parameter γI;
Mapping the training sample set to a kernel Hilbert space through a kernel function to obtain a kernel function matrix K, wherein K isij=k(xi,xj)。
4. The method according to claim 3, wherein the regularization parameter γ according to the optimal L1 paradigmAThe optimal Laplace regularization parameter gammaIThe obtaining of the sample set and the offset of the support vector by the kernel function and the constraint condition of the support vector machine comprises:
in thatUnder the conditions of (1), solvingObtaining the coefficient a of the discriminant model as a+-a-=[α1,α2,...,αl+u]TAnd offset b ═ beta+-β-In which ξi≥0, Delta is a constant coefficient, xiiIn order to be a function of the relaxation variable,L-D-W is a laplacian matrix,t > 0 is a preset parameter,and Dii=∑jWij;
According to the coefficient a of the discriminant model, obtaining a support vector sample set SVs ═ x in a training sample seti|αi≠0,i=1,…,N}。
5. An apparatus for speech classification recognition, comprising:
the classifier module is used for inputting a voice data sample to be distinguished into a pre-established classifier model and obtaining a classification result of the voice data sample according to an output value of the classifier model;
the classifier model is created and obtained by a classifier creating module, and the classifier creating module is used for:
inputting a training sample set of voice data into a Laplace support vector machine;
obtaining an optimal L1 normal form regular parameter gamma according to the training sample set, the positive definite parameter and the kernel functionAAnd an optimal Laplace regularization parameter γIAnd a kernel function matrix;
according to the optimal L1 paradigm regularization parameter gammaAThe optimal Laplace regularization parameter gammaIObtaining a support vector sample set and an offset by the kernel function and the constraint condition of the support vector machine;
obtaining a classifier model according to the support vector sample set and the offset;
wherein the classifier creation module comprises: an obtaining unit, configured to determine, according to the support vector sample set and the offset, the classifier model:
6. The apparatus of claim 5, wherein the classifier creation module comprises:
an input unit, configured to input a training sample set into a laplacian support vector machine:wherein xi∈RD,yiIs xiA label of (a) indicates xiWhen y isiWhen the E { -1, +1}, is the number of labeled training samples, when yiWhen the content is equal to 0, the content,u is the number of unlabeled training samples and D is the dimension of the original space.
7. The apparatus of claim 6, wherein the classifier creation module comprises:
a parameter processing unit for dividing the training sample set into several parts and comparing the divided training sample set with an L1 normal form canonical parameter gammaAAnd laplacian regularization parameter gammaITesting and training in a cross validation mode to obtain an optimal L1 paradigm canonical parameter gammaAAnd laplacian regularization parameter gammaI(ii) a Mapping the training sample set to a kernel Hilbert space through a kernel function to obtain a kernel function matrix K, wherein K isij=k(xi,xj)。
8. The apparatus of claim 7, wherein the classifier creation module comprises:
an arithmetic unit forUnder the conditions of (1), solvingObtaining the coefficient a of the discriminant model as a+-a-=[α1,α2,...,αl+u]TAnd offset b ═ beta+-β-In which ξi≥0, Delta is a constant coefficient, xiiIn order to be a function of the relaxation variable,L-D-W is a laplacian matrix,t > 0 is a predetermined parameter, and Dii=∑jWij(ii) a According to the coefficient a of the discriminant model, obtaining a support vector sample set SVs ═ x in a training sample seti|αi≠0,i=1,…,N}。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710774048.1A CN107507611B (en) | 2017-08-31 | 2017-08-31 | Voice classification recognition method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710774048.1A CN107507611B (en) | 2017-08-31 | 2017-08-31 | Voice classification recognition method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107507611A CN107507611A (en) | 2017-12-22 |
CN107507611B true CN107507611B (en) | 2021-08-24 |
Family
ID=60693417
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710774048.1A Active CN107507611B (en) | 2017-08-31 | 2017-08-31 | Voice classification recognition method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107507611B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109065027B (en) * | 2018-06-04 | 2023-05-02 | 平安科技(深圳)有限公司 | Voice distinguishing model training method and device, computer equipment and storage medium |
CN114582366A (en) * | 2022-03-02 | 2022-06-03 | 浪潮云信息技术股份公司 | Method for realizing audio segmentation labeling based on LapSVM |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1787075A (en) * | 2005-12-13 | 2006-06-14 | 浙江大学 | Method for distinguishing speek speek person by supporting vector machine model basedon inserted GMM core |
CN1975856A (en) * | 2006-10-30 | 2007-06-06 | 邹采荣 | Speech emotion identifying method based on supporting vector machine |
CN101640043A (en) * | 2009-09-01 | 2010-02-03 | 清华大学 | Speaker recognition method based on multi-coordinate sequence kernel and system thereof |
CN103258532A (en) * | 2012-11-28 | 2013-08-21 | 河海大学常州校区 | Method for recognizing Chinese speech emotions based on fuzzy support vector machine |
CN103605711A (en) * | 2013-11-12 | 2014-02-26 | 中国石油大学(北京) | Construction method and device, classification method and device of support vector machine |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105224984B (en) * | 2014-05-31 | 2018-03-13 | 华为技术有限公司 | A kind of data category recognition methods and device based on deep neural network |
-
2017
- 2017-08-31 CN CN201710774048.1A patent/CN107507611B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1787075A (en) * | 2005-12-13 | 2006-06-14 | 浙江大学 | Method for distinguishing speek speek person by supporting vector machine model basedon inserted GMM core |
CN1975856A (en) * | 2006-10-30 | 2007-06-06 | 邹采荣 | Speech emotion identifying method based on supporting vector machine |
CN101640043A (en) * | 2009-09-01 | 2010-02-03 | 清华大学 | Speaker recognition method based on multi-coordinate sequence kernel and system thereof |
CN103258532A (en) * | 2012-11-28 | 2013-08-21 | 河海大学常州校区 | Method for recognizing Chinese speech emotions based on fuzzy support vector machine |
CN103605711A (en) * | 2013-11-12 | 2014-02-26 | 中国石油大学(北京) | Construction method and device, classification method and device of support vector machine |
Also Published As
Publication number | Publication date |
---|---|
CN107507611A (en) | 2017-12-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR20180125905A (en) | Method and apparatus for classifying a class to which a sentence belongs by using deep neural network | |
Kamaruddin et al. | Cultural dependency analysis for understanding speech emotion | |
CN110310647B (en) | Voice identity feature extractor, classifier training method and related equipment | |
EP3588381A1 (en) | Method and apparatus for training classification model, method and apparatus for classifying | |
CN111081279A (en) | Voice emotion fluctuation analysis method and device | |
Provost | Identifying salient sub-utterance emotion dynamics using flexible units and estimates of affective flow | |
CN114627102B (en) | Image anomaly detection method, device and system and readable storage medium | |
CN109086794B (en) | Driving behavior pattern recognition method based on T-LDA topic model | |
CN111292851A (en) | Data classification method and device, computer equipment and storage medium | |
CN107507611B (en) | Voice classification recognition method and device | |
CN105609116A (en) | Speech emotional dimensions region automatic recognition method | |
EP2115737B1 (en) | Method and system to improve automated emotional recognition | |
CN111653274A (en) | Method, device and storage medium for awakening word recognition | |
Shah et al. | Speech emotion recognition based on SVM using MATLAB | |
CN113453065A (en) | Video segmentation method, system, terminal and medium based on deep learning | |
Dubey et al. | Robust speaker clustering using mixtures of von mises-fisher distributions for naturalistic audio streams | |
CN113610080B (en) | Cross-modal perception-based sensitive image identification method, device, equipment and medium | |
CN114818900A (en) | Semi-supervised feature extraction method and user credit risk assessment method | |
Gosztolya et al. | A feature selection-based speaker clustering method for paralinguistic tasks | |
CN112632229A (en) | Text clustering method and device | |
Grigore et al. | Self-organizing maps for identifying impaired speech | |
CN115083437B (en) | Method and device for determining uncertainty of learner pronunciation | |
CN117932073B (en) | Weak supervision text classification method and system based on prompt engineering | |
CN114912502B (en) | Double-mode deep semi-supervised emotion classification method based on expressions and voices | |
CN118135642A (en) | Facial expression analysis method and device, electronic equipment and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |