CN109523994A - A kind of multitask method of speech classification based on capsule neural network - Google Patents
A kind of multitask method of speech classification based on capsule neural network Download PDFInfo
- Publication number
- CN109523994A CN109523994A CN201811346110.8A CN201811346110A CN109523994A CN 109523994 A CN109523994 A CN 109523994A CN 201811346110 A CN201811346110 A CN 201811346110A CN 109523994 A CN109523994 A CN 109523994A
- Authority
- CN
- China
- Prior art keywords
- capsule
- neural network
- multitask
- voice
- speech
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 239000002775 capsule Substances 0.000 title claims abstract description 60
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 39
- 238000000034 method Methods 0.000 title claims abstract description 21
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 13
- 238000013461 design Methods 0.000 claims abstract description 9
- 238000012549 training Methods 0.000 claims abstract description 5
- 238000000605 extraction Methods 0.000 claims description 8
- 210000002569 neuron Anatomy 0.000 claims description 8
- 239000000284 extract Substances 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 5
- 239000011159 matrix material Substances 0.000 claims description 3
- 238000003062 neural network model Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 238000005267 amalgamation Methods 0.000 claims description 2
- 230000008859 change Effects 0.000 claims description 2
- 239000003292 glue Substances 0.000 claims description 2
- 238000011478 gradient descent method Methods 0.000 claims description 2
- 210000004218 nerve net Anatomy 0.000 claims description 2
- 238000004458 analytical method Methods 0.000 abstract description 3
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 238000002203 pretreatment Methods 0.000 abstract 1
- 230000006870 function Effects 0.000 description 22
- 238000011160 research Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 229910052729 chemical element Inorganic materials 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000008909 emotion recognition Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000036651 mood Effects 0.000 description 1
- 229930014626 natural product Natural products 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of multitask method of speech classification based on capsule neural network, are related to speech signal analysis, and the technical fields such as artificial intelligence solve the multitask classification problem in speech recognition.The present invention mainly has the feature representation for extracting voice, including from frequency domain, and multiple angles such as time domain go to extract the primary features of voice;With convolutional neural networks and capsule neural network, on the basis of voice primary features after the pre-treatment, then the abstract and study of profound phonetic feature is carried out;According to the multiple classifiers of multitask Demand Design after advanced features, the loss function of multiple classifiers is merged, unified training multitask Classification of Speech model is finally reached in multiple tasks while improving classification accuracy.
Description
Technical field
A kind of multitask method of speech classification based on capsule neural network is related to speech signal analysis processing and artificial intelligence
The technical fields such as energy, solve the speech recognition problem of multitask.
Background technique
Sound is one of the most convenient means of the daily exchange of people, while delivering information abundant.Voice is as a kind of
Important big data existence form is the indispensable part of big data composition, is had in the intelligent epoch in current manual huge
Research Prospects.Human-computer interaction emphasizes that offer user is comfortable, and natural Product Experience is felt, voice is as most natural interaction side
Formula, importance are not sayed.These intelligent sound products such as intelligent music recommendation, voice synchronous translation, Voice Communication
Everybody daily life is all facilitated significantly.The research of speech-sound intelligent technology has also been designed into many aspects at present: speech recognition,
Classification of Speech, semantic analysis etc., wherein Classification of Speech is the basis for studying voice data.Different classes of Classification of Speech, such as
Accents recognition, Speaker Identification, speech emotion recognition have had many successfully applications.The Classification of Speech of computer identifies energy
Power is the important component that computer carries out speech processes, is the key precondition for realizing natural human-computer interaction interface, has very
Big researching value and application value.
Often Classification of Speech task is considered independent, but a voice can transmit much information in practice, such as
Gender, word content, mood etc. are studied interrelated with display meaning between variant task.For example, accents recognition
Task and Speaker Identification are generally regarded as individual two classification tasks.But in fact, for same voice data, language
Once confirming, accent will also determine sound speaker.This research contents is by considering actual environment, it is desirable to divide from speech audio
Richer information is precipitated, classifies under unified model to multiple and different semantic tasks to realize.
Current manual's intellectual technology has several broad aspects, traditional deep neural network, generates confrontation network, enhancing study with
And capsule network.This research contents system is dedicated to the Classification of Speech problem by multitask by research capsule network, so that
Finally recognition effect gets a promotion system under multiple tasks.
Summary of the invention
The present invention provides a kind of multitask method of speech classification based on capsule neural network, analyze multitask voice it
Between correlation, solve the problems, such as multitask Classification of Speech, realize the abstract-learning of phonetic feature, obtain in multiple tasks being language
Cent class obtains more accurate result.
To achieve the goals above, the technical scheme adopted by the invention is that:
Multitask method of speech classification based on capsule neural network, it is characterised in that utilize depth convolutional neural networks and capsule
The more abstract higher layer voice feature of neural network learning, includes the following steps:
(1) voice original signal is pre-processed, using speech recognition algorithm, extracts the expression of voice low-level feature;
(2) it is expressed using the middle level features that depth convolutional neural networks extract voice signal;
(3) feature representation of the higher level of abstraction of voice is further extracted using capsule neural network;
(4) multiple and different classifier and loss function are designed, realizes the training whole end to end of multitask Classification of Speech.
Further, include the following steps: in the step (1)
(11) the primitive character expression of voice is one-dimensional high dimensional feature, special using different tradition in voice pretreated model
Extraction algorithm is levied, feature time-domain and frequency-domain feature is extracted to original audio, finally by various features amalgamation and expression input depth mind
Through network model;
(12) time domain speech feature extraction algorithm uses linear forecast coding coefficient (LPCC), is a kind of homomorphic signal processing side
Method, time domain speech feature extraction algorithm extract voice signal using Fourier using Meier Frequency Cepstral Coefficients (MFCC);Pass through
The voice primary features of different characteristics are merged, the final input for forming deep neural network model.
Further, include the following steps: in the step (2)
(21) the higher feature of input feature vector is extracted in the step (2) using the convolution operation of depth convolutional neural networks,
It can be indicated with following formula:
Wherein,The input of convolutional layer is defined,It indicates to learn weight in convolution kernel, whereinIt is convolution kernel letter
It counts and acts on nonlinear mapping function;
(22) the higher feature of input feature vector is extracted in the step (2) using the pondization operation of depth convolutional neural networks,
It can be indicated with following formula:
Wherein,The input for defining pond layer does not have since pond layer does not have the parameter of study;Common pond
Change operationFunction is maximized, minimum value or is averaged.
Further, include the following steps: in the step (3)
(31) capsule neural network is different from conventional depth neural network, and the minimum unit of calculating is one group of neuron, and glue
There are the weight of two kinds of different roles in keed network, it is respectively used to the weight predicted and predicted;
(32) firstly, capsule network prediction interval, it is similar to be calculated with traditional feedforward, by input capsule and forecast power it
Between matrix multiple obtain prediction result, specific formula calculates as follows:
Wherein,For low layer capsule neural network,Be expressed as prediction as a result, it is noted that hereWithIt is all one
The expression of group neuron;
(33) it is different from traditional convolutional neural networks, by the high-rise feature representation of lower layer network prediction in study, capsule nerve net
Network learns weight of the low layer various pieces to the same prediction again, and specific formula calculates as follows:
Wherein,It is expressed as prediction low layer capsuleTo high-rise capsulePrediction,It is expressed as the weight of prediction, it is final high
Layer capsule has obtained net input to all prediction weighted sums;It is worth noting that, being different from joining in traditional neural network
Several updates uses gradient descent method, hereIt is updated by dynamic routing algorithm;
(34) finally, the prediction expression of summation is needed by a nonlinear mapping, due to the smallest in capsule neural network
Computing unit is one group of neuron, and therefore, activation primitive is changed, and is mainly expressed as follows:
Wherein, the prediction expression after activation has the meaning of two aspects, and direction illustrates the attribute of the category, and its
Size is expressed as probability existing for the category.
Further, include the following steps: in the step (4)
(41) voice multitask categorised content, the corresponding label of numeralization multitask are determined;
(42) according to the categorised content of different types, the classifier of multiple quantity is defined;
(43) it is directed to different classifiers, designs corresponding loss function;Specific function design is as follows:
Wherein,For certain corresponding a kind of authentic specimen label of voice,For the probability value after classifier softmax,It indicates
The total quantity of sample passes through superpositionA sample obtains the damage of all samples on the generic task in the loss function of a certain generic task
Lose average value;
Above-mentioned is only loss function to be devised, for the Classification of Speech of multitask for the single classification results in multitask
Problem, final loss function are defined as follows:
Wherein,Indicate the above-mentioned loss function for single task role in population sample,It indicates to multiple in practice
The quantity of business, total loss function of final multitask speech recognition problemIt is expressed as all single loss function summations;
(44) network structure by above-mentioned design, constructs data set, allowable loss function and etc., finally calculated using backpropagation
The entire nerve of capsule end to end of method training.
Compared with the prior art, the advantages of the present invention are as follows:
One, preprocessing part dexterously merges the various primitive characters of voice, compared to original voice data, reduces
Data dimension is expressed compared to single primary features, enriches voice input information;
Two, on the basis of depth convolutional neural networks, state-of-the-art capsule neural network is further designed, for learning voice
Higher level feature representation;
Three, by the loss function of multitask, carry out the correlation between learning tasks, to preferably train network.
Detailed description of the invention
Fig. 1 is the illustraton of model of the multitask Classification of Speech based on capsule neural network in the present invention;
Fig. 2 is the flow chart of the multitask Classification of Speech based on capsule neural network in the present invention;
Fig. 3 is the topological diagram of capsule in the present invention.
Specific embodiment
The present invention is further illustrated with reference to the accompanying drawings and examples.
Referring to Fig. 1, a kind of kernel model of the multitask audio recognition method based on capsule neural network is a capsule
Neural network model, the model are inputted by receiving the data of different phonetic primitive character combination, while using the basic of convolution
Structure can carry out feature learning to input feature vector, then the profound structure using capsule network carries out into one primary features
The feature extraction of step, while considering the learning objective of multitask, new loss function is designed, thus effectively in multiple tasks
Improve the accuracy of speech recognition.
Referring to fig. 2, a kind of overall data process of the multitask method of speech classification based on capsule neural network is specific to walk
It is rapid as follows:
(1) audio pre-processes: the extraction algorithm of phonetic feature is related to a variety of classic algorithms, and the calculating of Meier coefficient is such as in MFCC
Under:
Wherein,Indicate voice actual frequency, above-mentioned formula describes the relationship of Mel frequency and actual frequency in algorithm, human ear
Audible frequencies and Mel frequency growth be in consistency.
The LPCC being related to mainly calculates the linear prediction residue error in voice, and calculation is as follows:
Wherein,It is expressed asStage linear anticipation function.By the way that above-mentioned various primary speech features are mixed to get most
Whole mode input feature.
(2) convolution and pond: extracting the higher feature of input feature vector using the convolution operation of depth convolutional neural networks,
It can be indicated with following formula:
Wherein,The input of convolutional layer is defined,It indicates to learn weight in convolution kernel, whereinIt is convolution kernel letter
It counts and acts on nonlinear mapping function;
The higher feature that input feature vector is extracted using the pondization operation of depth convolutional neural networks, can be indicated with following formula:
Wherein,Define the input of pond layer, common pondization operationFunction is maximized, minimum value or is averaged
Value.
(3) capsule neural network: the basic computational ele- ment of capsule network is one group of neuron, and each vector represents one group
Neuronal structure, the calculating between two layers of capsule network are needed through two steps: prediction is summed with prediction.By input capsule and in advance
It surveys matrix multiple between weight and obtains intermediate prediction result, specific formula calculates as follows:
Wherein,For low layer capsule neural network,It is expressed as the result of prediction.
By the high-rise feature representation of lower layer network prediction in study, capsule neural network is to low layer various pieces to same
The weight of prediction is learnt again, and specific formula calculates as follows:
Wherein,It is expressed as prediction low layer capsuleTo high-rise capsulePrediction,It is expressed as the weight of prediction, it is final high
Layer capsule has obtained net input to all prediction weighted sums.
Finally, the prediction expression of summation is needed by a nonlinear mapping, due to the smallest in capsule neural network
Computing unit is one group of neuron, and therefore, activation primitive is changed, and is mainly expressed as follows:
Wherein,Expression is exported for final capsule.
(4) total losses function: determining the content of task first, designs multiple classifiers, the learning objective of corresponding multitask.
For the learning objective of a certain single task role, corresponding loss function design is as follows:
Wherein,For certain corresponding a kind of authentic specimen label of voice,For the probability value after classifier softmax,It indicates
The total quantity of sample passes through superpositionA sample obtains the damage of all samples on the generic task in the loss function of a certain generic task
Lose average value;
It is multitask Classification of Speech problem since this model is corresponding, it is therefore desirable to a rule is designed, to independent loss letter
Number merged, thus the total losses function of the Classification of Speech model of multitask embody it is as follows:
Wherein,Indicate the above-mentioned loss function for single task role in population sample,It indicates to multiple in practice
The quantity of business, total loss function of final multitask speech recognition problemIt is expressed as all single loss function summations.
Referring to Fig. 3, a kind of calculating topological diagram in any two-tier network based on capsule neural network,It is expressed as low layer
The feature representation that arrives of capsule neural network learning, pass throughIt goes to carry out a study to input and predicts high-rise one
It expresses, the result of prediction interval is concealed in diagramAnd each weight value of prediction interval, finally can just obtain next layer
High-rise capsule expression, specifically it is expressed as follows:
。
Claims (5)
1. a kind of multitask method of speech classification based on capsule neural network, it is characterised in that extracted using capsule neural network
The feature of voice higher level of abstraction, while being classified using the multitask that multi-categorizer completes voice, include the following steps:
(1) voice original signal is pre-processed, using speech recognition algorithm, extracts the expression of voice low-level feature;
(2) it is expressed using the middle level features that depth convolutional neural networks extract voice signal;
(3) feature representation of the higher level of abstraction of voice is further extracted using capsule neural network;
(4) multiple and different classifier and loss function are designed, realizes the training whole end to end of multitask Classification of Speech.
2. a kind of multitask method of speech classification based on capsule neural network according to claim 1, the step (1)
In include the following steps:
(11) the primitive character expression of voice is one-dimensional high dimensional feature, special using different tradition in voice pretreated model
Extraction algorithm is levied, feature time-domain and frequency-domain feature is extracted to original audio, finally by various features amalgamation and expression input depth mind
Through network model;
(12) time domain speech feature extraction algorithm uses linear forecast coding coefficient (LPCC), is a kind of homomorphic signal processing side
Method, time domain speech feature extraction algorithm extract voice signal using Fourier using Meier Frequency Cepstral Coefficients (MFCC);Pass through
The voice primary features of different characteristics are merged, the final input for forming deep neural network model.
3. a kind of multitask method of speech classification based on capsule neural network according to claim 1, the step (2)
In include the following steps:
(21) the higher feature of input feature vector is extracted in the step (2) using the convolution operation of depth convolutional neural networks,
It can be indicated with following formula:
Wherein,The input of convolutional layer is defined,It indicates to learn weight in convolution kernel, whereinIt is convolution kernel function
And act on nonlinear mapping function;
(22) the higher feature of input feature vector is extracted in the step (2) using the pondization operation of depth convolutional neural networks,
It can be indicated with following formula:
Wherein,The input for defining pond layer does not have since pond layer does not have the parameter of study;Common pond
Change operationFunction is maximized, minimum value or is averaged.
4. a kind of multitask method of speech classification based on capsule neural network according to claim 1, the step (3)
In include the following steps:
(31) capsule neural network is different from conventional depth neural network, and the minimum unit of calculating is one group of neuron, and glue
There are the weight of two kinds of different roles in keed network, it is respectively used to the weight predicted and predicted;
(32) firstly, capsule network prediction interval, it is similar to be calculated with traditional feedforward, by input capsule and forecast power it
Between matrix multiple obtain prediction result, specific formula calculates as follows:
Wherein,For low layer capsule neural network,Be expressed as prediction as a result, it is noted that hereWithAll
It is the expression of one group of neuron;
(33) it is different from traditional convolutional neural networks, by the high-rise feature representation of lower layer network prediction in study, capsule nerve net
Network learns weight of the low layer various pieces to the same prediction again, and specific formula calculates as follows:
Wherein,It is expressed as prediction low layer capsuleTo high-rise capsulePrediction,It is expressed as the weight of prediction, finally
High-rise capsule has obtained net input to all prediction weighted sums;It is worth noting that, being different from traditional neural network
The update of parameter uses gradient descent method, hereIt is updated by dynamic routing algorithm;
(34) finally, the prediction expression of summation is needed by a nonlinear mapping, due to the smallest in capsule neural network
Computing unit is one group of neuron, and therefore, activation primitive is changed, and is mainly expressed as follows:
Wherein, the prediction expression after activation has the meaning of two aspects, and direction illustrates the attribute of the category, and its
Size is expressed as probability existing for the category.
5. a kind of multitask method of speech classification based on capsule neural network according to claim 1, the step (4)
In include the following steps:
(41) voice multitask categorised content, the corresponding label of numeralization multitask are determined;
(42) according to the categorised content of different types, the classifier of multiple quantity is defined;
(43) it is directed to different classifiers, designs corresponding loss function;Specific function design is as follows:
Wherein,For certain corresponding a kind of authentic specimen label of voice,For the probability value after classifier softmax,Indicate sample
This total quantity, passes through superpositionA sample obtains the loss of all samples on the generic task in the loss function of a certain generic task
Average value;
Above-mentioned is only loss function to be devised, for the Classification of Speech of multitask for the single classification results in multitask
Problem, final loss function are defined as follows:
Wherein,Indicate the above-mentioned loss function for single task role in population sample,It indicates to multiple tasks in practice
Quantity, total loss function of final multitask speech recognition problemIt is expressed as all single loss function summations;
(44) network structure by above-mentioned design, constructs data set, allowable loss function and etc., finally calculated using backpropagation
The entire capsule neural network end to end of method training.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811346110.8A CN109523994A (en) | 2018-11-13 | 2018-11-13 | A kind of multitask method of speech classification based on capsule neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811346110.8A CN109523994A (en) | 2018-11-13 | 2018-11-13 | A kind of multitask method of speech classification based on capsule neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109523994A true CN109523994A (en) | 2019-03-26 |
Family
ID=65776175
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811346110.8A Pending CN109523994A (en) | 2018-11-13 | 2018-11-13 | A kind of multitask method of speech classification based on capsule neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109523994A (en) |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110120224A (en) * | 2019-05-10 | 2019-08-13 | 平安科技(深圳)有限公司 | Construction method, device, computer equipment and the storage medium of bird sound identification model |
CN110428843A (en) * | 2019-03-11 | 2019-11-08 | 杭州雄迈信息技术有限公司 | A kind of voice gender identification deep learning method |
CN110931046A (en) * | 2019-11-29 | 2020-03-27 | 福州大学 | Audio high-level semantic feature extraction method and system for overlapped sound event detection |
CN110968729A (en) * | 2019-11-21 | 2020-04-07 | 浙江树人学院(浙江树人大学) | Family activity sound event classification method based on additive interval capsule network |
CN111179961A (en) * | 2020-01-02 | 2020-05-19 | 腾讯科技(深圳)有限公司 | Audio signal processing method, audio signal processing device, electronic equipment and storage medium |
CN111357051A (en) * | 2019-12-24 | 2020-06-30 | 深圳市优必选科技股份有限公司 | Speech emotion recognition method, intelligent device and computer readable storage medium |
CN111584010A (en) * | 2020-04-01 | 2020-08-25 | 昆明理工大学 | Key protein identification method based on capsule neural network and ensemble learning |
CN111862949A (en) * | 2020-07-30 | 2020-10-30 | 北京小米松果电子有限公司 | Natural language processing method and device, electronic equipment and storage medium |
CN112562725A (en) * | 2020-12-09 | 2021-03-26 | 山西财经大学 | Mixed voice emotion classification method based on spectrogram and capsule network |
CN112599134A (en) * | 2020-12-02 | 2021-04-02 | 国网安徽省电力有限公司 | Transformer sound event detection method based on voiceprint recognition |
CN112992191A (en) * | 2021-05-12 | 2021-06-18 | 北京世纪好未来教育科技有限公司 | Voice endpoint detection method and device, electronic equipment and readable storage medium |
CN113314119A (en) * | 2021-07-27 | 2021-08-27 | 深圳百昱达科技有限公司 | Voice recognition intelligent household control method and device |
CN113343924A (en) * | 2021-07-01 | 2021-09-03 | 齐鲁工业大学 | Modulation signal identification method based on multi-scale cyclic spectrum feature and self-attention generation countermeasure network |
CN113362857A (en) * | 2021-06-15 | 2021-09-07 | 厦门大学 | Real-time speech emotion recognition method based on CapcNN and application device |
CN113378984A (en) * | 2021-07-05 | 2021-09-10 | 国药(武汉)医学实验室有限公司 | Medical image classification method, system, terminal and storage medium |
CN113782000A (en) * | 2021-09-29 | 2021-12-10 | 北京中科智加科技有限公司 | Language identification method based on multiple tasks |
CN115376518A (en) * | 2022-10-26 | 2022-11-22 | 广州声博士声学技术有限公司 | Voiceprint recognition method, system, device and medium for real-time noise big data |
US11735168B2 (en) | 2020-07-20 | 2023-08-22 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method and apparatus for recognizing voice |
WO2023222088A1 (en) * | 2022-05-20 | 2023-11-23 | 青岛海尔电冰箱有限公司 | Voice recognition and classification method and apparatus |
CN117275461A (en) * | 2023-11-23 | 2023-12-22 | 上海蜜度科技股份有限公司 | Multitasking audio processing method, system, storage medium and electronic equipment |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH06295196A (en) * | 1993-04-08 | 1994-10-21 | Casio Comput Co Ltd | Speech recognition device and signal recognition device |
WO2005059811A1 (en) * | 2003-12-16 | 2005-06-30 | Canon Kabushiki Kaisha | Pattern identification method, apparatus, and program |
US20160284346A1 (en) * | 2015-03-27 | 2016-09-29 | Qualcomm Incorporated | Deep neural net based filter prediction for audio event classification and extraction |
CN106601235A (en) * | 2016-12-02 | 2017-04-26 | 厦门理工学院 | Semi-supervision multitask characteristic selecting speech recognition method |
US20170148431A1 (en) * | 2015-11-25 | 2017-05-25 | Baidu Usa Llc | End-to-end speech recognition |
CN107578775A (en) * | 2017-09-07 | 2018-01-12 | 四川大学 | A kind of multitask method of speech classification based on deep neural network |
CN107610692A (en) * | 2017-09-22 | 2018-01-19 | 杭州电子科技大学 | The sound identification method of self-encoding encoder multiple features fusion is stacked based on neutral net |
US20180068675A1 (en) * | 2016-09-07 | 2018-03-08 | Google Inc. | Enhanced multi-channel acoustic models |
CN108010514A (en) * | 2017-11-20 | 2018-05-08 | 四川大学 | A kind of method of speech classification based on deep neural network |
GB201807225D0 (en) * | 2018-03-14 | 2018-06-13 | Papercup Tech Limited | A speech processing system and a method of processing a speech signal |
CN108766461A (en) * | 2018-07-17 | 2018-11-06 | 厦门美图之家科技有限公司 | Audio feature extraction methods and device |
-
2018
- 2018-11-13 CN CN201811346110.8A patent/CN109523994A/en active Pending
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH06295196A (en) * | 1993-04-08 | 1994-10-21 | Casio Comput Co Ltd | Speech recognition device and signal recognition device |
WO2005059811A1 (en) * | 2003-12-16 | 2005-06-30 | Canon Kabushiki Kaisha | Pattern identification method, apparatus, and program |
US20160284346A1 (en) * | 2015-03-27 | 2016-09-29 | Qualcomm Incorporated | Deep neural net based filter prediction for audio event classification and extraction |
US20170148431A1 (en) * | 2015-11-25 | 2017-05-25 | Baidu Usa Llc | End-to-end speech recognition |
US20180068675A1 (en) * | 2016-09-07 | 2018-03-08 | Google Inc. | Enhanced multi-channel acoustic models |
CN106601235A (en) * | 2016-12-02 | 2017-04-26 | 厦门理工学院 | Semi-supervision multitask characteristic selecting speech recognition method |
CN107578775A (en) * | 2017-09-07 | 2018-01-12 | 四川大学 | A kind of multitask method of speech classification based on deep neural network |
CN107610692A (en) * | 2017-09-22 | 2018-01-19 | 杭州电子科技大学 | The sound identification method of self-encoding encoder multiple features fusion is stacked based on neutral net |
CN108010514A (en) * | 2017-11-20 | 2018-05-08 | 四川大学 | A kind of method of speech classification based on deep neural network |
GB201807225D0 (en) * | 2018-03-14 | 2018-06-13 | Papercup Tech Limited | A speech processing system and a method of processing a speech signal |
CN108766461A (en) * | 2018-07-17 | 2018-11-06 | 厦门美图之家科技有限公司 | Audio feature extraction methods and device |
Non-Patent Citations (6)
Title |
---|
LE,D等: "Discretized continuous speech emotion recognition with multi-task deep recurrent neural network", 《18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH-COMMUNICATION-ASSOCIATION(INTERSPEECH2017)》 * |
NAM KYUN KIM等: "Speech emotion recognition based on multi-task learning using a convolutional neural network", 《2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC)》 * |
余成波等: "基于胶囊网络的指静脉识别研究", 《电子技术应用》 * |
朱应钊等: "胶囊网络技术及发展趋势研究", 《广东通信技术》 * |
胡文凭: "基于深层神经网络的口语发音检测与错误分析", 《中国博士学位论文全文数据库信息科技辑》 * |
郭俊文: "基于CAPSNET的可穿戴心电采集和心律失常检测***研究", 《中国优秀硕士学位论文全文数据库医药卫生科技辑》 * |
Cited By (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110428843A (en) * | 2019-03-11 | 2019-11-08 | 杭州雄迈信息技术有限公司 | A kind of voice gender identification deep learning method |
CN110428843B (en) * | 2019-03-11 | 2021-09-07 | 杭州巨峰科技有限公司 | Voice gender recognition deep learning method |
CN110120224A (en) * | 2019-05-10 | 2019-08-13 | 平安科技(深圳)有限公司 | Construction method, device, computer equipment and the storage medium of bird sound identification model |
CN110120224B (en) * | 2019-05-10 | 2023-01-20 | 平安科技(深圳)有限公司 | Method and device for constructing bird sound recognition model, computer equipment and storage medium |
CN110968729A (en) * | 2019-11-21 | 2020-04-07 | 浙江树人学院(浙江树人大学) | Family activity sound event classification method based on additive interval capsule network |
CN110968729B (en) * | 2019-11-21 | 2022-05-17 | 浙江树人学院(浙江树人大学) | Family activity sound event classification method based on additive interval capsule network |
CN110931046A (en) * | 2019-11-29 | 2020-03-27 | 福州大学 | Audio high-level semantic feature extraction method and system for overlapped sound event detection |
WO2021127982A1 (en) * | 2019-12-24 | 2021-07-01 | 深圳市优必选科技股份有限公司 | Speech emotion recognition method, smart device, and computer-readable storage medium |
CN111357051A (en) * | 2019-12-24 | 2020-06-30 | 深圳市优必选科技股份有限公司 | Speech emotion recognition method, intelligent device and computer readable storage medium |
CN111357051B (en) * | 2019-12-24 | 2024-02-02 | 深圳市优必选科技股份有限公司 | Speech emotion recognition method, intelligent device and computer readable storage medium |
CN111179961A (en) * | 2020-01-02 | 2020-05-19 | 腾讯科技(深圳)有限公司 | Audio signal processing method, audio signal processing device, electronic equipment and storage medium |
CN111584010A (en) * | 2020-04-01 | 2020-08-25 | 昆明理工大学 | Key protein identification method based on capsule neural network and ensemble learning |
CN111584010B (en) * | 2020-04-01 | 2022-05-27 | 昆明理工大学 | Key protein identification method based on capsule neural network and ensemble learning |
US11735168B2 (en) | 2020-07-20 | 2023-08-22 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method and apparatus for recognizing voice |
CN111862949A (en) * | 2020-07-30 | 2020-10-30 | 北京小米松果电子有限公司 | Natural language processing method and device, electronic equipment and storage medium |
CN111862949B (en) * | 2020-07-30 | 2024-04-02 | 北京小米松果电子有限公司 | Natural language processing method and device, electronic equipment and storage medium |
CN112599134A (en) * | 2020-12-02 | 2021-04-02 | 国网安徽省电力有限公司 | Transformer sound event detection method based on voiceprint recognition |
CN112562725A (en) * | 2020-12-09 | 2021-03-26 | 山西财经大学 | Mixed voice emotion classification method based on spectrogram and capsule network |
CN112992191B (en) * | 2021-05-12 | 2021-11-05 | 北京世纪好未来教育科技有限公司 | Voice endpoint detection method and device, electronic equipment and readable storage medium |
CN112992191A (en) * | 2021-05-12 | 2021-06-18 | 北京世纪好未来教育科技有限公司 | Voice endpoint detection method and device, electronic equipment and readable storage medium |
CN113362857A (en) * | 2021-06-15 | 2021-09-07 | 厦门大学 | Real-time speech emotion recognition method based on CapcNN and application device |
CN113343924A (en) * | 2021-07-01 | 2021-09-03 | 齐鲁工业大学 | Modulation signal identification method based on multi-scale cyclic spectrum feature and self-attention generation countermeasure network |
CN113378984A (en) * | 2021-07-05 | 2021-09-10 | 国药(武汉)医学实验室有限公司 | Medical image classification method, system, terminal and storage medium |
CN113378984B (en) * | 2021-07-05 | 2023-05-02 | 国药(武汉)医学实验室有限公司 | Medical image classification method, system, terminal and storage medium |
CN113314119B (en) * | 2021-07-27 | 2021-12-03 | 深圳百昱达科技有限公司 | Voice recognition intelligent household control method and device |
CN113314119A (en) * | 2021-07-27 | 2021-08-27 | 深圳百昱达科技有限公司 | Voice recognition intelligent household control method and device |
CN113782000A (en) * | 2021-09-29 | 2021-12-10 | 北京中科智加科技有限公司 | Language identification method based on multiple tasks |
WO2023222088A1 (en) * | 2022-05-20 | 2023-11-23 | 青岛海尔电冰箱有限公司 | Voice recognition and classification method and apparatus |
CN115376518A (en) * | 2022-10-26 | 2022-11-22 | 广州声博士声学技术有限公司 | Voiceprint recognition method, system, device and medium for real-time noise big data |
CN117275461A (en) * | 2023-11-23 | 2023-12-22 | 上海蜜度科技股份有限公司 | Multitasking audio processing method, system, storage medium and electronic equipment |
CN117275461B (en) * | 2023-11-23 | 2024-03-15 | 上海蜜度科技股份有限公司 | Multitasking audio processing method, system, storage medium and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109523994A (en) | A kind of multitask method of speech classification based on capsule neural network | |
Kim et al. | Towards speech emotion recognition" in the wild" using aggregated corpora and deep multi-task learning | |
CN106228977B (en) | Multi-mode fusion song emotion recognition method based on deep learning | |
CN110534132A (en) | A kind of speech-emotion recognition method of the parallel-convolution Recognition with Recurrent Neural Network based on chromatogram characteristic | |
CN108597539A (en) | Speech-emotion recognition method based on parameter migration and sound spectrograph | |
CN110289003A (en) | A kind of method of Application on Voiceprint Recognition, the method for model training and server | |
CN106503805A (en) | A kind of bimodal based on machine learning everybody talk with sentiment analysis system and method | |
CN109241524A (en) | Semantic analysis method and device, computer readable storage medium, electronic equipment | |
CN109285562A (en) | Speech-emotion recognition method based on attention mechanism | |
CN108763326A (en) | A kind of sentiment analysis model building method of the diversified convolutional neural networks of feature based | |
CN104538027B (en) | The mood of voice social media propagates quantization method and system | |
Shahriar et al. | Classifying maqams of Qur’anic recitations using deep learning | |
Chen et al. | Distilled binary neural network for monaural speech separation | |
CN109767789A (en) | A kind of new feature extracting method for speech emotion recognition | |
CN111899766B (en) | Speech emotion recognition method based on optimization fusion of depth features and acoustic features | |
Cardona et al. | Online phoneme recognition using multi-layer perceptron networks combined with recurrent non-linear autoregressive neural networks with exogenous inputs | |
Cao et al. | Speaker-independent speech emotion recognition based on random forest feature selection algorithm | |
CN113077823A (en) | Subdomain self-adaptive cross-library speech emotion recognition method based on depth self-encoder | |
Bergler et al. | Deep representation learning for orca call type classification | |
CN115393933A (en) | Video face emotion recognition method based on frame attention mechanism | |
CN112487237A (en) | Music classification method based on self-adaptive CNN and semi-supervised self-training model | |
Chen et al. | Construction of affective education in mobile learning: The study based on learner’s interest and emotion recognition | |
CN110532380A (en) | A kind of text sentiment classification method based on memory network | |
Singh et al. | Speaker Recognition Assessment in a Continuous System for Speaker Identification | |
Tashakori et al. | Designing the Intelligent System Detecting a Sense of Wonder in English Speech Signal Using Fuzzy-Nervous Inference-Adaptive system (ANFIS) |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190326 |