CN107507611B - Voice classification recognition method and device - Google Patents

Voice classification recognition method and device Download PDF

Info

Publication number
CN107507611B
CN107507611B CN201710774048.1A CN201710774048A CN107507611B CN 107507611 B CN107507611 B CN 107507611B CN 201710774048 A CN201710774048 A CN 201710774048A CN 107507611 B CN107507611 B CN 107507611B
Authority
CN
China
Prior art keywords
sample set
support vector
obtaining
optimal
classifier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710774048.1A
Other languages
Chinese (zh)
Other versions
CN107507611A (en
Inventor
张莉
徐志强
王邦军
张召
李凡长
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN201710774048.1A priority Critical patent/CN107507611B/en
Publication of CN107507611A publication Critical patent/CN107507611A/en
Application granted granted Critical
Publication of CN107507611B publication Critical patent/CN107507611B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/083Recognition networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0631Creating reference templates; Clustering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/085Methods for reducing search complexity, pruning

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for speech classification recognition, which comprises the steps of inputting a speech data sample to be distinguished into a pre-established classifier model, and obtaining a classification result of the speech data sample according to an output value of the classifier model; the classifier model is based on the L1 paradigm canonical parameter and Laplace canonical parameter determination and the constraint conditions of the support vector machine to obtain a support vector sample set, so that the obtained classifier model has strong sparsity and interpretability, strong filtering capacity on noise and strong robustness on noise, and a more accurate result on the speech classification can be obtained. The invention also provides a device for speech classification and recognition, which has the beneficial effects.

Description

Voice classification recognition method and device
Technical Field
The invention relates to the field of artificial intelligence application, in particular to a method and a device for speech classification and recognition.
Background
With the development of artificial intelligence, computer technology is widely applied to various fields, wherein speech recognition is one of the directions with application values, and conventional processing technologies for speech and language are quite complex, so that certain burden is brought to the operation of a computer.
At present, a relatively simple voice processing technology adopts a semi-supervised algorithm to create a generating model, and the created generating model is used for processing voice, but the creating process of the currently obtained generating model for processing voice is complex, the filtering capability for noise is not very strong, and stronger robustness is lacked.
Disclosure of Invention
The invention aims to provide a method for speech classification recognition, which solves the problem of low noise filtering capability of a speech processing generation model and improves the accuracy of speech classification recognition.
Another object of the present invention is to provide an apparatus for speech classification recognition.
In order to solve the above technical problem, the present invention provides a method for speech classification and recognition, comprising:
inputting a voice data sample to be distinguished into a pre-established classifier model, and obtaining a classification result of the voice data sample according to an output value of the classifier model; wherein the classifier model creating process is as follows:
inputting a training sample set of voice data into a Laplace support vector machine; obtaining an optimal L1 normal form regular parameter gamma according to the training sample set, the positive definite parameter and the kernel functionAAnd an optimal Laplace regularization parameter γIAnd a kernel function matrix; according to the optimal L1 paradigm regularization parameter gammaAThe optimal Laplace regularization parameter gammaIObtaining a support vector sample set and an offset by the kernel function and the constraint condition of the support vector machine; and obtaining a classifier model according to the support vector sample set and the offset.
Wherein the training sample set of the input speech data into the laplacian support vector machine comprises:
inputting a training sample set into a Laplace support vector machine:
Figure BDA0001395578500000021
wherein xi∈RD,yiIs xiA label of (a) indicates xiWhen y isiWhen the E { -1, +1},
Figure BDA0001395578500000027
Figure BDA0001395578500000028
is the number of labeled training samples, when yiWhen the content is equal to 0, the content,
Figure BDA0001395578500000029
u is the number of unlabeled training samples and D is the dimension of the original space.
Obtaining an optimal L1 normal form canonical parameter gamma according to the training sample set, the positive definite parameter and the kernel functionAAnd lapalalSpline parameter gammaIAnd the kernel function matrix comprises:
dividing the training sample set into a plurality of parts, and comparing the divided training sample set with an L1 paradigm canonical parameter gammaAAnd laplacian regularization parameter gammaITesting and training in a cross validation mode to obtain an optimal L1 paradigm canonical parameter gammaAAnd an optimal Laplace regularization parameter γI
Mapping the training sample set to a kernel Hilbert space through a kernel function to obtain a kernel function matrix K, wherein K isij=k(xi,xj)。
Wherein the regularization parameter γ according to the optimal L1 paradigmAThe optimal Laplace regularization parameter gammaIThe obtaining of the sample set and the offset of the support vector by the kernel function and the constraint condition of the support vector machine comprises:
in that
Figure BDA0001395578500000022
Under the conditions of (1), solving
Figure BDA0001395578500000023
Obtaining the coefficient a of the discriminant model as a+-a-=[α12,...,αl+u]TAnd offset b ═ beta+-Wherein, in the step (A),
Figure BDA0001395578500000024
delta is a constant coefficient, xiiIn order to be a function of the relaxation variable,
Figure BDA0001395578500000025
L-D-W is a laplacian matrix,
Figure BDA0001395578500000026
is a preset parameter, and Dii=∑jWij
According to the coefficient a of the discriminant model, obtaining a support vector sample set SVs ═ x in a training sample setii≠0,i=1,…,N}。
Wherein the obtaining a classifier model according to the support vector sample set and the offset comprises:
determining the classifier model according to the support vector sample set and the offset:
Figure BDA0001395578500000031
wherein x is a voice data sample to be judged, wherein x belongs to RD,xsvIs a support vector, asvIs the model coefficient of the support vector, and the value of y is the discrimination result of the voice data sample x.
The invention also provides a device for speech classification recognition, which comprises:
the classifier module is used for inputting a voice data sample to be distinguished into a pre-established classifier model and obtaining a classification result of the voice data sample according to an output value of the classifier model; the classifier model is created and obtained by a classifier creating module, and the classifier creating module is used for:
inputting a training sample set of voice data into a Laplace support vector machine; obtaining an optimal L1 normal form regular parameter gamma according to the training sample set, the positive definite parameter and the kernel functionAAnd an optimal Laplace regularization parameter γIAnd a kernel function matrix; according to the optimal L1 paradigm regularization parameter gammaAThe optimal Laplace regularization parameter gammaIObtaining a support vector sample set and an offset by the kernel function and the constraint condition of the support vector machine; and obtaining a classifier model according to the support vector sample set and the offset.
Wherein the classifier creation module comprises:
an input unit, configured to input a training sample set into a laplacian support vector machine:
Figure BDA0001395578500000032
wherein xi∈RD,yiIs xiA label of (a) indicates xiWhen y isiWhen the E { -1, +1},
Figure BDA0001395578500000033
Figure BDA0001395578500000034
is the number of labeled training samples, when yiWhen the content is equal to 0, the content,
Figure BDA0001395578500000035
u is the number of unlabeled training samples and D is the dimension of the original space.
Wherein the classifier creation module comprises:
a parameter processing unit for dividing the training sample set into several parts and comparing the divided training sample set with an L1 normal form canonical parameter gammaAAnd laplacian regularization parameter gammaITesting and training in a cross validation mode to obtain an optimal L1 paradigm canonical parameter gammaAAnd laplacian regularization parameter gammaI(ii) a Mapping the training sample set to a kernel Hilbert space through a kernel function to obtain a kernel function matrix K, wherein K isij=k(xi,xj)。
Wherein the classifier creation module comprises:
an arithmetic unit for
Figure BDA0001395578500000041
Under the conditions of (1), solving
Figure BDA0001395578500000042
Obtaining the coefficient a of the discriminant model as a+-a-=[α12,...,αl+u]TAnd offset b ═ beta+-Wherein, in the step (A),
Figure BDA0001395578500000043
delta is a constant coefficient, xiiIn order to be a function of the relaxation variable,
Figure BDA0001395578500000044
L-D-W is a laplacian matrix,
Figure BDA0001395578500000045
is a preset parameter, and Dii=∑jWij(ii) a According to the coefficient a of the discriminant model, obtaining a support vector sample set SVs ═ x in a training sample setii≠0,i=1,…,N}。
Wherein the classifier creation module comprises:
an obtaining unit, configured to determine, according to the support vector sample set and the offset, the classifier model:
Figure BDA0001395578500000046
wherein x is a voice data sample to be judged, wherein x belongs to RD,xsvIs a support vector, asvIs the model coefficient of the support vector, and the value of y is the discrimination result of the voice data sample x.
The invention provides a speech classification recognition method, which obtains a discrimination result by inputting a speech data sample to be discriminated into a pre-established classifier model, wherein the classifier model adopts L1 paradigm canonical parameter and Laplace canonical parameter determination and constraint conditions of a support vector machine to obtain a support vector sample set, so that the obtained classifier model, the L1 paradigm canonical parameter and the Laplace canonical parameter enable the classifier model to have sparsity and interpretability, in practical application, a needed model can be obtained by adopting few sample points, which further enhances the noise filtering capability of the classifier, thereby obtaining good robustness, compared with the prior art, the classifier model adopted by the invention has low complexity and stronger interpretability and sparsity, on the basis of improving the recognition rate of the classifier model, and the method also has stronger robustness to noise, so that the result of voice classification is more accurate.
The invention also provides a device for speech classification and recognition, which has the beneficial effects.
Drawings
In order to more clearly illustrate the embodiments or technical solutions of the present invention, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
FIG. 1 is a flow diagram of one embodiment of speech classification recognition provided by the present invention;
fig. 2 is a block diagram of a speech classification recognition apparatus according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the disclosure, the invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 shows a flowchart of a specific embodiment of speech classification recognition, and the method may include:
step S101: a training sample set of speech data is input into a laplacian support vector machine.
Step S102: obtaining an optimal L1 normal form regular parameter gamma according to the training sample set, the positive definite parameter and the kernel functionAAnd an optimal Laplace regularization parameter γIAnd a kernel function matrix.
Step S103: according to the optimal L1 paradigm regularization parameter gammaAThe optimal Laplace regularization parameter gammaIAnd the kernel function and the constraint condition of the support vector machine are used for obtaining a support vector sample set and an offset.
Step S104: and obtaining a classifier model according to the support vector sample set and the offset.
Step S105: and inputting the voice data sample to be distinguished into the classifier model, and obtaining a classification result of the voice data sample according to the output value of the classifier model.
It should be noted that, in this embodiment, steps S101 to S104 are all processes of creating a classifier model, and the classifier in the present invention may be created in advance according to steps S101 to S104, and only the speech data sample to be determined needs to be input according to step S105, so that the classification result can be obtained.
In addition, the generation model in the prior art needs to take the specific content of the recognized speech and needs to recognize the speech content including noise, thereby resulting in poor robustness, while the classifier model adopted in the present invention is used to recognize whether the content of the speech is what we want without paying attention to the specific content of the speech.
As a simple example, in creating the classifier model, 150 project person recordings were chosen, each reading the alphabet twice for 52 samples, with each sample dimension 617. And (3) dividing the data set aiming at voice content discrimination: the pronunciation of the first 120 persons for the letter a and the letter b is extracted to obtain 480 x 617 of data set, the letter a voice data is divided into positive class, the word b voice data is divided into negative class, 80% is drawn as training set, the rest is used as testing set. In the training set part, ten percent of data are divided into data with labels, and the rest are used as non-label data. And training by using a training set to obtain a classifier model, and inputting a test set to obtain the accuracy of the classifier model based on the training sample.
In the practical application process, the speech of a and b is selected as a training sample to obtain a classifier model capable of distinguishing the speech a and the speech b, and when new speech is input, the specific content of the speech is not concerned as long as whether the speech is a or b is judged.
The classifier model adopted in the speech classification recognition greatly reduces the technical difficulty and saves the cost when the requirements are met in certain scenes. The simplest application is that when the user answers questions with voice, for some questions with fixed answers, the voice data collection can be preprocessed, and then the model is used for judging to obtain results. For example, judging the question, training the learner by using 'right and wrong' voices in advance to obtain a corresponding model, preprocessing the collected voice data, then using the model to obtain a result, and further comparing the result output with the original question answer.
In summary, the classifier model adopted in the invention uses the L1 paradigm regularization, so that the objective function for creating the classifier model has sparsity and interpretability, that is, the required classifier model can be obtained by using few training sample points, thereby well eliminating noise data and enabling the classifier model to have good robustness.
Based on the above embodiments, another specific embodiment of the present invention may include:
the training sample set for inputting the voice data into the laplacian support vector machine specifically comprises:
inputting a training sample set into a Laplace support vector machine:
Figure BDA0001395578500000071
wherein xi∈RD,yiIs xiA label of (a) indicates xiWhen y isiWhen the E { -1, +1},
Figure BDA0001395578500000072
Figure BDA0001395578500000073
is the number of labeled training samples, when yiWhen the content is equal to 0, the content,
Figure BDA0001395578500000074
u is the number of unlabeled training samples and D is the dimension of the original space.
Based on the above embodiments, another specific embodiment of the present invention may include:
obtaining an optimal L1 normal form regular parameter gamma according to the training sample set, the positive definite parameter and the kernel functionAAnd laplacian regularization parameter gammaIAnd the kernel function matrix is specifically:
dividing the training sample set into a plurality of parts, and comparing the divided training sample set with an L1 paradigm canonical parameter gammaAAnd laplacian regularization parameter gammaITesting and training in a cross validation mode to obtain an optimal L1 paradigm canonical parameter gammaAAnd an optimal Laplace regularization parameter γI(ii) a Mapping the training sample set to a kernel Hilbert space through a kernel function to obtain a kernel function matrix K, wherein K isij=k(xi,xj)。
Specifically, the cross validation method is a k-fold cross validation method, and the L1 paradigm canonical parameter and the Laplace canonical parameter are obtained through cross validation.
For example, the training set is divided into five equal parts, one part is selected for testing, the other parts are used for training, five accuracy rates are obtained for averaging, the obtained average accuracy rate is the accuracy rate corresponding to the L1 paradigm canonical parameter and the Laplace canonical parameter, and finally the L1 paradigm canonical parameter and the Laplace canonical parameter corresponding to the maximum accuracy rate are selected as final parameters.
Based on the above embodiments, another specific embodiment of the present invention may include:
the regularization parameter gamma according to the optimal L1 paradigmAThe optimal Laplace regularization parameter gammaIThe obtaining of the support vector sample set and the offset by the kernel function and the constraint condition of the support vector machine is specifically as follows:
in that
Figure BDA0001395578500000081
Under the conditions of (1), solving
Figure BDA0001395578500000082
Obtaining the coefficient a of the discriminant model as a+-a-=[α12,...,αl+u]TAnd offset b ═ beta+-Wherein, in the step (A),
Figure BDA0001395578500000083
delta is a constant coefficient, xiiIn order to be a function of the relaxation variable,
Figure BDA0001395578500000084
L-D-W is a laplacian matrix,
Figure BDA0001395578500000085
is a preset parameter, and Dii=∑jWij
According to the coefficient a of the discriminant model, obtaining a support vector sample set SVs ═ x in a training sample setii≠0,i=1,…,N}。
Specifically, a small normal number is taken as δ to ensure that a unique solution is obtained, and the constant coefficient is obtained.
Based on the above embodiments, another specific embodiment of the present invention may include:
the obtaining a classifier model according to the support vector sample set and the offset specifically includes:
determining the classifier model according to the support vector sample set and the offset:
Figure BDA0001395578500000086
wherein x is a voice data sample to be judged, wherein x belongs to RD,xsvIs a support vector, asvIs the model coefficient of the support vector, and the value of y is the discrimination result of the voice data sample x.
The following describes a speech classification recognition apparatus according to an embodiment of the present invention, and the speech classification recognition apparatus described below and the speech classification recognition method described above may be referred to correspondingly.
Fig. 2 is a block diagram of a speech classification recognition apparatus according to an embodiment of the present invention, where the speech classification recognition apparatus according to fig. 2 may include:
the classifier module 100 is configured to input a speech data sample to be distinguished into a pre-created classifier model, and obtain a classification result of the speech data sample according to an output value of the classifier model; wherein the classifier model is created by the classifier creation module 200, and the classifier creation module 200 is configured to:
inputting a training sample set of voice data into a Laplace support vector machine; obtaining an optimal L1 normal form regular parameter gamma according to the training sample set, the positive definite parameter and the kernel functionAAnd an optimal Laplace regularization parameter γIAnd a kernel function matrix; according to the optimal L1 paradigm regularization parameter gammaAThe optimal Laplace regularization parameter gammaIObtaining a support vector sample set and an offset by the kernel function and the constraint condition of the support vector machine; and obtaining a classifier model according to the support vector sample set and the offset.
Optionally, the classifier creating module 200 includes:
an input unit, configured to input a training sample set into a laplacian support vector machine:
Figure BDA0001395578500000091
wherein xi∈RD,yiIs xiA label of (a) indicates xiWhen y isiWhen the E { -1, +1},
Figure BDA0001395578500000097
Figure BDA0001395578500000098
is the number of labeled training samples, when yiWhen the content is equal to 0, the content,
Figure BDA0001395578500000099
u is the number of unlabeled training samples and D is the dimension of the original space.
Optionally, the classifier creating module 200 includes:
parameter processing unit, usingDividing the training sample set into several parts, and comparing the divided training sample set with an L1 paradigm canonical parameter gammaAAnd laplacian regularization parameter gammaITesting and training in a cross validation mode to obtain an optimal L1 paradigm canonical parameter gammaAAnd laplacian regularization parameter gammaI(ii) a Mapping the training sample set to a kernel Hilbert space through a kernel function to obtain a kernel function matrix K, wherein K isij=k(xi,xj)。
Optionally, the classifier creating module 200 includes:
an arithmetic unit for
Figure BDA0001395578500000092
Under the conditions of (1), solving
Figure BDA0001395578500000093
Obtaining the coefficient a of the discriminant model as a+-a-=[α12,...,αl+u]TAnd offset b ═ beta+-Wherein, in the step (A),
Figure BDA0001395578500000094
delta is a constant coefficient, xiiIn order to be a function of the relaxation variable,
Figure BDA0001395578500000095
L-D-W is a laplacian matrix,
Figure BDA0001395578500000096
is a preset parameter, and Dii=∑jWij(ii) a According to the coefficient a of the discriminant model, obtaining a support vector sample set SVs ═ x in a training sample setii≠0,i=1,…,N}。
Optionally, the classifier creating module 200 includes:
an obtaining unit, configured to determine, according to the support vector sample set and the offset, the classifier model:
Figure BDA0001395578500000101
wherein x is a voice data sample to be judged, wherein x belongs to RD,xsvIs a support vector, asvIs the model coefficient of the support vector, and the value of y is the discrimination result of the voice data sample x.
The device for classifying and recognizing speech of this embodiment is used to implement the foregoing method for classifying and recognizing speech, and therefore the specific implementation manner of the device for classifying and recognizing speech may be found in the foregoing embodiment portions of the method for classifying and recognizing speech, for example, the classifier module 100, the step S105 of the method for implementing the foregoing method for classifying and recognizing speech, and the classifier creation module 200, the steps S101, S102, S103, and S104 of the method for implementing the foregoing method for classifying and recognizing speech, so that the specific implementation manner thereof may refer to the description of the corresponding embodiments of each portion, and will not be described herein again.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The method and apparatus for speech classification recognition provided by the present invention are described in detail above. The principles and embodiments of the present invention are explained herein using specific examples, which are presented only to assist in understanding the method and its core concepts. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.

Claims (8)

1. A method of speech classification recognition, comprising:
inputting a voice data sample to be distinguished into a pre-established classifier model, and obtaining a classification result of the voice data sample according to an output value of the classifier model;
wherein the classifier model creating process is as follows:
inputting a training sample set of voice data into a Laplace support vector machine;
obtaining an optimal L1 normal form regular parameter gamma according to the training sample set, the positive definite parameter and the kernel functionAAnd an optimal Laplace regularization parameter γIAnd a kernel function matrix;
according to the optimal L1 paradigm regularization parameter gammaAThe optimal Laplace regularization parameter gammaIObtaining a support vector sample set and an offset by the kernel function and the constraint condition of the support vector machine;
obtaining the classifier model according to the support vector sample set and the offset;
wherein the obtaining a classifier model according to the support vector sample set and the offset comprises:
determining the classifier model according to the support vector sample set and the offset:
Figure FDA0003133140280000011
wherein, x is a voice data sample to be judged, and x belongs to RD,xsvIs a support vector, asvIs the model coefficient of the support vector, and the value of y is the discrimination result of the voice data sample x.
2. The method of claim 1, wherein the inputting the training sample set of speech data into the laplacian support vector machine comprises:
inputting a training sample set into a Laplace support vector machine:
Figure FDA0003133140280000012
wherein xi∈RD,yiIs xiA label of (a) indicates xiWhen y isiWhen the E { -1, +1},
Figure FDA0003133140280000013
Figure FDA0003133140280000014
is the number of labeled training samples, when yiWhen the content is equal to 0, the content,
Figure FDA0003133140280000015
u is the number of unlabeled training samples and D is the dimension of the original space.
3. The method according to claim 2, wherein the optimal L1 normal form canonical parameter γ is obtained according to the training sample set and the positive definite parameter and the kernel functionAAnd laplacian regularization parameter gammaIAnd the kernel function matrix comprises:
dividing the training sample set into several parts byDivided training sample set pair L1 normal form regular parameter gammaAAnd laplacian regularization parameter gammaITesting and training in a cross validation mode to obtain an optimal L1 paradigm canonical parameter gammaAAnd an optimal Laplace regularization parameter γI
Mapping the training sample set to a kernel Hilbert space through a kernel function to obtain a kernel function matrix K, wherein K isij=k(xi,xj)。
4. The method according to claim 3, wherein the regularization parameter γ according to the optimal L1 paradigmAThe optimal Laplace regularization parameter gammaIThe obtaining of the sample set and the offset of the support vector by the kernel function and the constraint condition of the support vector machine comprises:
in that
Figure FDA0003133140280000021
Under the conditions of (1), solving
Figure FDA0003133140280000022
Obtaining the coefficient a of the discriminant model as a+-a-=[α12,...,αl+u]TAnd offset b ═ beta+-In which ξi≥0,
Figure FDA0003133140280000023
Figure FDA0003133140280000024
Delta is a constant coefficient, xiiIn order to be a function of the relaxation variable,
Figure FDA0003133140280000025
L-D-W is a laplacian matrix,
Figure FDA0003133140280000026
t > 0 is a preset parameter,and Dii=∑jWij
According to the coefficient a of the discriminant model, obtaining a support vector sample set SVs ═ x in a training sample setii≠0,i=1,…,N}。
5. An apparatus for speech classification recognition, comprising:
the classifier module is used for inputting a voice data sample to be distinguished into a pre-established classifier model and obtaining a classification result of the voice data sample according to an output value of the classifier model;
the classifier model is created and obtained by a classifier creating module, and the classifier creating module is used for:
inputting a training sample set of voice data into a Laplace support vector machine;
obtaining an optimal L1 normal form regular parameter gamma according to the training sample set, the positive definite parameter and the kernel functionAAnd an optimal Laplace regularization parameter γIAnd a kernel function matrix;
according to the optimal L1 paradigm regularization parameter gammaAThe optimal Laplace regularization parameter gammaIObtaining a support vector sample set and an offset by the kernel function and the constraint condition of the support vector machine;
obtaining a classifier model according to the support vector sample set and the offset;
wherein the classifier creation module comprises: an obtaining unit, configured to determine, according to the support vector sample set and the offset, the classifier model:
Figure FDA0003133140280000027
wherein x is a voice data sample to be judged, wherein x belongs to RD,xsvIs a support vector, asvIs the model coefficient of the support vector, and the value of y is the discrimination result of the voice data sample x.
6. The apparatus of claim 5, wherein the classifier creation module comprises:
an input unit, configured to input a training sample set into a laplacian support vector machine:
Figure FDA0003133140280000028
wherein xi∈RD,yiIs xiA label of (a) indicates xiWhen y isiWhen the E { -1, +1},
Figure FDA0003133140280000029
Figure FDA00031331402800000210
is the number of labeled training samples, when yiWhen the content is equal to 0, the content,
Figure FDA0003133140280000031
u is the number of unlabeled training samples and D is the dimension of the original space.
7. The apparatus of claim 6, wherein the classifier creation module comprises:
a parameter processing unit for dividing the training sample set into several parts and comparing the divided training sample set with an L1 normal form canonical parameter gammaAAnd laplacian regularization parameter gammaITesting and training in a cross validation mode to obtain an optimal L1 paradigm canonical parameter gammaAAnd laplacian regularization parameter gammaI(ii) a Mapping the training sample set to a kernel Hilbert space through a kernel function to obtain a kernel function matrix K, wherein K isij=k(xi,xj)。
8. The apparatus of claim 7, wherein the classifier creation module comprises:
an arithmetic unit for
Figure FDA0003133140280000032
Under the conditions of (1), solving
Figure FDA0003133140280000033
Obtaining the coefficient a of the discriminant model as a+-a-=[α12,...,αl+u]TAnd offset b ═ beta+-In which ξi≥0,
Figure FDA0003133140280000034
Figure FDA0003133140280000035
Delta is a constant coefficient, xiiIn order to be a function of the relaxation variable,
Figure FDA0003133140280000036
L-D-W is a laplacian matrix,
Figure FDA0003133140280000037
t > 0 is a predetermined parameter, and Dii=∑jWij(ii) a According to the coefficient a of the discriminant model, obtaining a support vector sample set SVs ═ x in a training sample setii≠0,i=1,…,N}。
CN201710774048.1A 2017-08-31 2017-08-31 Voice classification recognition method and device Active CN107507611B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710774048.1A CN107507611B (en) 2017-08-31 2017-08-31 Voice classification recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710774048.1A CN107507611B (en) 2017-08-31 2017-08-31 Voice classification recognition method and device

Publications (2)

Publication Number Publication Date
CN107507611A CN107507611A (en) 2017-12-22
CN107507611B true CN107507611B (en) 2021-08-24

Family

ID=60693417

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710774048.1A Active CN107507611B (en) 2017-08-31 2017-08-31 Voice classification recognition method and device

Country Status (1)

Country Link
CN (1) CN107507611B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109065027B (en) * 2018-06-04 2023-05-02 平安科技(深圳)有限公司 Voice distinguishing model training method and device, computer equipment and storage medium
CN114582366A (en) * 2022-03-02 2022-06-03 浪潮云信息技术股份公司 Method for realizing audio segmentation labeling based on LapSVM

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1787075A (en) * 2005-12-13 2006-06-14 浙江大学 Method for distinguishing speek speek person by supporting vector machine model basedon inserted GMM core
CN1975856A (en) * 2006-10-30 2007-06-06 邹采荣 Speech emotion identifying method based on supporting vector machine
CN101640043A (en) * 2009-09-01 2010-02-03 清华大学 Speaker recognition method based on multi-coordinate sequence kernel and system thereof
CN103258532A (en) * 2012-11-28 2013-08-21 河海大学常州校区 Method for recognizing Chinese speech emotions based on fuzzy support vector machine
CN103605711A (en) * 2013-11-12 2014-02-26 中国石油大学(北京) Construction method and device, classification method and device of support vector machine

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105224984B (en) * 2014-05-31 2018-03-13 华为技术有限公司 A kind of data category recognition methods and device based on deep neural network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1787075A (en) * 2005-12-13 2006-06-14 浙江大学 Method for distinguishing speek speek person by supporting vector machine model basedon inserted GMM core
CN1975856A (en) * 2006-10-30 2007-06-06 邹采荣 Speech emotion identifying method based on supporting vector machine
CN101640043A (en) * 2009-09-01 2010-02-03 清华大学 Speaker recognition method based on multi-coordinate sequence kernel and system thereof
CN103258532A (en) * 2012-11-28 2013-08-21 河海大学常州校区 Method for recognizing Chinese speech emotions based on fuzzy support vector machine
CN103605711A (en) * 2013-11-12 2014-02-26 中国石油大学(北京) Construction method and device, classification method and device of support vector machine

Also Published As

Publication number Publication date
CN107507611A (en) 2017-12-22

Similar Documents

Publication Publication Date Title
KR20180125905A (en) Method and apparatus for classifying a class to which a sentence belongs by using deep neural network
Kamaruddin et al. Cultural dependency analysis for understanding speech emotion
CN110310647B (en) Voice identity feature extractor, classifier training method and related equipment
EP3588381A1 (en) Method and apparatus for training classification model, method and apparatus for classifying
CN111081279A (en) Voice emotion fluctuation analysis method and device
Provost Identifying salient sub-utterance emotion dynamics using flexible units and estimates of affective flow
CN114627102B (en) Image anomaly detection method, device and system and readable storage medium
CN109086794B (en) Driving behavior pattern recognition method based on T-LDA topic model
CN111292851A (en) Data classification method and device, computer equipment and storage medium
CN107507611B (en) Voice classification recognition method and device
CN105609116A (en) Speech emotional dimensions region automatic recognition method
EP2115737B1 (en) Method and system to improve automated emotional recognition
CN111653274A (en) Method, device and storage medium for awakening word recognition
Shah et al. Speech emotion recognition based on SVM using MATLAB
CN113453065A (en) Video segmentation method, system, terminal and medium based on deep learning
Dubey et al. Robust speaker clustering using mixtures of von mises-fisher distributions for naturalistic audio streams
CN113610080B (en) Cross-modal perception-based sensitive image identification method, device, equipment and medium
CN114818900A (en) Semi-supervised feature extraction method and user credit risk assessment method
Gosztolya et al. A feature selection-based speaker clustering method for paralinguistic tasks
CN112632229A (en) Text clustering method and device
Grigore et al. Self-organizing maps for identifying impaired speech
CN115083437B (en) Method and device for determining uncertainty of learner pronunciation
CN117932073B (en) Weak supervision text classification method and system based on prompt engineering
CN114912502B (en) Double-mode deep semi-supervised emotion classification method based on expressions and voices
CN118135642A (en) Facial expression analysis method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant