CN109523994A - A kind of multitask method of speech classification based on capsule neural network - Google Patents

A kind of multitask method of speech classification based on capsule neural network Download PDF

Info

Publication number
CN109523994A
CN109523994A CN201811346110.8A CN201811346110A CN109523994A CN 109523994 A CN109523994 A CN 109523994A CN 201811346110 A CN201811346110 A CN 201811346110A CN 109523994 A CN109523994 A CN 109523994A
Authority
CN
China
Prior art keywords
capsule
neural network
multitask
voice
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811346110.8A
Other languages
Chinese (zh)
Inventor
陈盈科
毛华
吴雨
何涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN201811346110.8A priority Critical patent/CN109523994A/en
Publication of CN109523994A publication Critical patent/CN109523994A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of multitask method of speech classification based on capsule neural network, are related to speech signal analysis, and the technical fields such as artificial intelligence solve the multitask classification problem in speech recognition.The present invention mainly has the feature representation for extracting voice, including from frequency domain, and multiple angles such as time domain go to extract the primary features of voice;With convolutional neural networks and capsule neural network, on the basis of voice primary features after the pre-treatment, then the abstract and study of profound phonetic feature is carried out;According to the multiple classifiers of multitask Demand Design after advanced features, the loss function of multiple classifiers is merged, unified training multitask Classification of Speech model is finally reached in multiple tasks while improving classification accuracy.

Description

A kind of multitask method of speech classification based on capsule neural network
Technical field
A kind of multitask method of speech classification based on capsule neural network is related to speech signal analysis processing and artificial intelligence The technical fields such as energy, solve the speech recognition problem of multitask.
Background technique
Sound is one of the most convenient means of the daily exchange of people, while delivering information abundant.Voice is as a kind of Important big data existence form is the indispensable part of big data composition, is had in the intelligent epoch in current manual huge Research Prospects.Human-computer interaction emphasizes that offer user is comfortable, and natural Product Experience is felt, voice is as most natural interaction side Formula, importance are not sayed.These intelligent sound products such as intelligent music recommendation, voice synchronous translation, Voice Communication Everybody daily life is all facilitated significantly.The research of speech-sound intelligent technology has also been designed into many aspects at present: speech recognition, Classification of Speech, semantic analysis etc., wherein Classification of Speech is the basis for studying voice data.Different classes of Classification of Speech, such as Accents recognition, Speaker Identification, speech emotion recognition have had many successfully applications.The Classification of Speech of computer identifies energy Power is the important component that computer carries out speech processes, is the key precondition for realizing natural human-computer interaction interface, has very Big researching value and application value.
Often Classification of Speech task is considered independent, but a voice can transmit much information in practice, such as Gender, word content, mood etc. are studied interrelated with display meaning between variant task.For example, accents recognition Task and Speaker Identification are generally regarded as individual two classification tasks.But in fact, for same voice data, language Once confirming, accent will also determine sound speaker.This research contents is by considering actual environment, it is desirable to divide from speech audio Richer information is precipitated, classifies under unified model to multiple and different semantic tasks to realize.
Current manual's intellectual technology has several broad aspects, traditional deep neural network, generates confrontation network, enhancing study with And capsule network.This research contents system is dedicated to the Classification of Speech problem by multitask by research capsule network, so that Finally recognition effect gets a promotion system under multiple tasks.
Summary of the invention
The present invention provides a kind of multitask method of speech classification based on capsule neural network, analyze multitask voice it Between correlation, solve the problems, such as multitask Classification of Speech, realize the abstract-learning of phonetic feature, obtain in multiple tasks being language Cent class obtains more accurate result.
To achieve the goals above, the technical scheme adopted by the invention is that:
Multitask method of speech classification based on capsule neural network, it is characterised in that utilize depth convolutional neural networks and capsule The more abstract higher layer voice feature of neural network learning, includes the following steps:
(1) voice original signal is pre-processed, using speech recognition algorithm, extracts the expression of voice low-level feature;
(2) it is expressed using the middle level features that depth convolutional neural networks extract voice signal;
(3) feature representation of the higher level of abstraction of voice is further extracted using capsule neural network;
(4) multiple and different classifier and loss function are designed, realizes the training whole end to end of multitask Classification of Speech.
Further, include the following steps: in the step (1)
(11) the primitive character expression of voice is one-dimensional high dimensional feature, special using different tradition in voice pretreated model Extraction algorithm is levied, feature time-domain and frequency-domain feature is extracted to original audio, finally by various features amalgamation and expression input depth mind Through network model;
(12) time domain speech feature extraction algorithm uses linear forecast coding coefficient (LPCC), is a kind of homomorphic signal processing side Method, time domain speech feature extraction algorithm extract voice signal using Fourier using Meier Frequency Cepstral Coefficients (MFCC);Pass through The voice primary features of different characteristics are merged, the final input for forming deep neural network model.
Further, include the following steps: in the step (2)
(21) the higher feature of input feature vector is extracted in the step (2) using the convolution operation of depth convolutional neural networks, It can be indicated with following formula:
Wherein,The input of convolutional layer is defined,It indicates to learn weight in convolution kernel, whereinIt is convolution kernel letter It counts and acts on nonlinear mapping function;
(22) the higher feature of input feature vector is extracted in the step (2) using the pondization operation of depth convolutional neural networks, It can be indicated with following formula:
Wherein,The input for defining pond layer does not have since pond layer does not have the parameter of study;Common pond Change operationFunction is maximized, minimum value or is averaged.
Further, include the following steps: in the step (3)
(31) capsule neural network is different from conventional depth neural network, and the minimum unit of calculating is one group of neuron, and glue There are the weight of two kinds of different roles in keed network, it is respectively used to the weight predicted and predicted;
(32) firstly, capsule network prediction interval, it is similar to be calculated with traditional feedforward, by input capsule and forecast power it Between matrix multiple obtain prediction result, specific formula calculates as follows:
Wherein,For low layer capsule neural network,Be expressed as prediction as a result, it is noted that hereWithIt is all one The expression of group neuron;
(33) it is different from traditional convolutional neural networks, by the high-rise feature representation of lower layer network prediction in study, capsule nerve net Network learns weight of the low layer various pieces to the same prediction again, and specific formula calculates as follows:
Wherein,It is expressed as prediction low layer capsuleTo high-rise capsulePrediction,It is expressed as the weight of prediction, it is final high Layer capsule has obtained net input to all prediction weighted sums;It is worth noting that, being different from joining in traditional neural network Several updates uses gradient descent method, hereIt is updated by dynamic routing algorithm;
(34) finally, the prediction expression of summation is needed by a nonlinear mapping, due to the smallest in capsule neural network Computing unit is one group of neuron, and therefore, activation primitive is changed, and is mainly expressed as follows:
Wherein, the prediction expression after activation has the meaning of two aspects, and direction illustrates the attribute of the category, and its Size is expressed as probability existing for the category.
Further, include the following steps: in the step (4)
(41) voice multitask categorised content, the corresponding label of numeralization multitask are determined;
(42) according to the categorised content of different types, the classifier of multiple quantity is defined;
(43) it is directed to different classifiers, designs corresponding loss function;Specific function design is as follows:
Wherein,For certain corresponding a kind of authentic specimen label of voice,For the probability value after classifier softmax,It indicates The total quantity of sample passes through superpositionA sample obtains the damage of all samples on the generic task in the loss function of a certain generic task Lose average value;
Above-mentioned is only loss function to be devised, for the Classification of Speech of multitask for the single classification results in multitask Problem, final loss function are defined as follows:
Wherein,Indicate the above-mentioned loss function for single task role in population sample,It indicates to multiple in practice The quantity of business, total loss function of final multitask speech recognition problemIt is expressed as all single loss function summations;
(44) network structure by above-mentioned design, constructs data set, allowable loss function and etc., finally calculated using backpropagation The entire nerve of capsule end to end of method training.
Compared with the prior art, the advantages of the present invention are as follows:
One, preprocessing part dexterously merges the various primitive characters of voice, compared to original voice data, reduces Data dimension is expressed compared to single primary features, enriches voice input information;
Two, on the basis of depth convolutional neural networks, state-of-the-art capsule neural network is further designed, for learning voice Higher level feature representation;
Three, by the loss function of multitask, carry out the correlation between learning tasks, to preferably train network.
Detailed description of the invention
Fig. 1 is the illustraton of model of the multitask Classification of Speech based on capsule neural network in the present invention;
Fig. 2 is the flow chart of the multitask Classification of Speech based on capsule neural network in the present invention;
Fig. 3 is the topological diagram of capsule in the present invention.
Specific embodiment
The present invention is further illustrated with reference to the accompanying drawings and examples.
Referring to Fig. 1, a kind of kernel model of the multitask audio recognition method based on capsule neural network is a capsule Neural network model, the model are inputted by receiving the data of different phonetic primitive character combination, while using the basic of convolution Structure can carry out feature learning to input feature vector, then the profound structure using capsule network carries out into one primary features The feature extraction of step, while considering the learning objective of multitask, new loss function is designed, thus effectively in multiple tasks Improve the accuracy of speech recognition.
Referring to fig. 2, a kind of overall data process of the multitask method of speech classification based on capsule neural network is specific to walk It is rapid as follows:
(1) audio pre-processes: the extraction algorithm of phonetic feature is related to a variety of classic algorithms, and the calculating of Meier coefficient is such as in MFCC Under:
Wherein,Indicate voice actual frequency, above-mentioned formula describes the relationship of Mel frequency and actual frequency in algorithm, human ear Audible frequencies and Mel frequency growth be in consistency.
The LPCC being related to mainly calculates the linear prediction residue error in voice, and calculation is as follows:
Wherein,It is expressed asStage linear anticipation function.By the way that above-mentioned various primary speech features are mixed to get most Whole mode input feature.
(2) convolution and pond: extracting the higher feature of input feature vector using the convolution operation of depth convolutional neural networks, It can be indicated with following formula:
Wherein,The input of convolutional layer is defined,It indicates to learn weight in convolution kernel, whereinIt is convolution kernel letter It counts and acts on nonlinear mapping function;
The higher feature that input feature vector is extracted using the pondization operation of depth convolutional neural networks, can be indicated with following formula:
Wherein,Define the input of pond layer, common pondization operationFunction is maximized, minimum value or is averaged Value.
(3) capsule neural network: the basic computational ele- ment of capsule network is one group of neuron, and each vector represents one group Neuronal structure, the calculating between two layers of capsule network are needed through two steps: prediction is summed with prediction.By input capsule and in advance It surveys matrix multiple between weight and obtains intermediate prediction result, specific formula calculates as follows:
Wherein,For low layer capsule neural network,It is expressed as the result of prediction.
By the high-rise feature representation of lower layer network prediction in study, capsule neural network is to low layer various pieces to same The weight of prediction is learnt again, and specific formula calculates as follows:
Wherein,It is expressed as prediction low layer capsuleTo high-rise capsulePrediction,It is expressed as the weight of prediction, it is final high Layer capsule has obtained net input to all prediction weighted sums.
Finally, the prediction expression of summation is needed by a nonlinear mapping, due to the smallest in capsule neural network Computing unit is one group of neuron, and therefore, activation primitive is changed, and is mainly expressed as follows:
Wherein,Expression is exported for final capsule.
(4) total losses function: determining the content of task first, designs multiple classifiers, the learning objective of corresponding multitask. For the learning objective of a certain single task role, corresponding loss function design is as follows:
Wherein,For certain corresponding a kind of authentic specimen label of voice,For the probability value after classifier softmax,It indicates The total quantity of sample passes through superpositionA sample obtains the damage of all samples on the generic task in the loss function of a certain generic task Lose average value;
It is multitask Classification of Speech problem since this model is corresponding, it is therefore desirable to a rule is designed, to independent loss letter Number merged, thus the total losses function of the Classification of Speech model of multitask embody it is as follows:
Wherein,Indicate the above-mentioned loss function for single task role in population sample,It indicates to multiple in practice The quantity of business, total loss function of final multitask speech recognition problemIt is expressed as all single loss function summations.
Referring to Fig. 3, a kind of calculating topological diagram in any two-tier network based on capsule neural network,It is expressed as low layer The feature representation that arrives of capsule neural network learning, pass throughIt goes to carry out a study to input and predicts high-rise one It expresses, the result of prediction interval is concealed in diagramAnd each weight value of prediction interval, finally can just obtain next layer High-rise capsule expression, specifically it is expressed as follows:

Claims (5)

1. a kind of multitask method of speech classification based on capsule neural network, it is characterised in that extracted using capsule neural network The feature of voice higher level of abstraction, while being classified using the multitask that multi-categorizer completes voice, include the following steps:
(1) voice original signal is pre-processed, using speech recognition algorithm, extracts the expression of voice low-level feature;
(2) it is expressed using the middle level features that depth convolutional neural networks extract voice signal;
(3) feature representation of the higher level of abstraction of voice is further extracted using capsule neural network;
(4) multiple and different classifier and loss function are designed, realizes the training whole end to end of multitask Classification of Speech.
2. a kind of multitask method of speech classification based on capsule neural network according to claim 1, the step (1) In include the following steps:
(11) the primitive character expression of voice is one-dimensional high dimensional feature, special using different tradition in voice pretreated model Extraction algorithm is levied, feature time-domain and frequency-domain feature is extracted to original audio, finally by various features amalgamation and expression input depth mind Through network model;
(12) time domain speech feature extraction algorithm uses linear forecast coding coefficient (LPCC), is a kind of homomorphic signal processing side Method, time domain speech feature extraction algorithm extract voice signal using Fourier using Meier Frequency Cepstral Coefficients (MFCC);Pass through The voice primary features of different characteristics are merged, the final input for forming deep neural network model.
3. a kind of multitask method of speech classification based on capsule neural network according to claim 1, the step (2) In include the following steps:
(21) the higher feature of input feature vector is extracted in the step (2) using the convolution operation of depth convolutional neural networks, It can be indicated with following formula:
Wherein,The input of convolutional layer is defined,It indicates to learn weight in convolution kernel, whereinIt is convolution kernel function And act on nonlinear mapping function;
(22) the higher feature of input feature vector is extracted in the step (2) using the pondization operation of depth convolutional neural networks, It can be indicated with following formula:
Wherein,The input for defining pond layer does not have since pond layer does not have the parameter of study;Common pond Change operationFunction is maximized, minimum value or is averaged.
4. a kind of multitask method of speech classification based on capsule neural network according to claim 1, the step (3) In include the following steps:
(31) capsule neural network is different from conventional depth neural network, and the minimum unit of calculating is one group of neuron, and glue There are the weight of two kinds of different roles in keed network, it is respectively used to the weight predicted and predicted;
(32) firstly, capsule network prediction interval, it is similar to be calculated with traditional feedforward, by input capsule and forecast power it Between matrix multiple obtain prediction result, specific formula calculates as follows:
Wherein,For low layer capsule neural network,Be expressed as prediction as a result, it is noted that hereWithAll It is the expression of one group of neuron;
(33) it is different from traditional convolutional neural networks, by the high-rise feature representation of lower layer network prediction in study, capsule nerve net Network learns weight of the low layer various pieces to the same prediction again, and specific formula calculates as follows:
Wherein,It is expressed as prediction low layer capsuleTo high-rise capsulePrediction,It is expressed as the weight of prediction, finally High-rise capsule has obtained net input to all prediction weighted sums;It is worth noting that, being different from traditional neural network The update of parameter uses gradient descent method, hereIt is updated by dynamic routing algorithm;
(34) finally, the prediction expression of summation is needed by a nonlinear mapping, due to the smallest in capsule neural network Computing unit is one group of neuron, and therefore, activation primitive is changed, and is mainly expressed as follows:
Wherein, the prediction expression after activation has the meaning of two aspects, and direction illustrates the attribute of the category, and its Size is expressed as probability existing for the category.
5. a kind of multitask method of speech classification based on capsule neural network according to claim 1, the step (4) In include the following steps:
(41) voice multitask categorised content, the corresponding label of numeralization multitask are determined;
(42) according to the categorised content of different types, the classifier of multiple quantity is defined;
(43) it is directed to different classifiers, designs corresponding loss function;Specific function design is as follows:
Wherein,For certain corresponding a kind of authentic specimen label of voice,For the probability value after classifier softmax,Indicate sample This total quantity, passes through superpositionA sample obtains the loss of all samples on the generic task in the loss function of a certain generic task Average value;
Above-mentioned is only loss function to be devised, for the Classification of Speech of multitask for the single classification results in multitask Problem, final loss function are defined as follows:
Wherein,Indicate the above-mentioned loss function for single task role in population sample,It indicates to multiple tasks in practice Quantity, total loss function of final multitask speech recognition problemIt is expressed as all single loss function summations;
(44) network structure by above-mentioned design, constructs data set, allowable loss function and etc., finally calculated using backpropagation The entire capsule neural network end to end of method training.
CN201811346110.8A 2018-11-13 2018-11-13 A kind of multitask method of speech classification based on capsule neural network Pending CN109523994A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811346110.8A CN109523994A (en) 2018-11-13 2018-11-13 A kind of multitask method of speech classification based on capsule neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811346110.8A CN109523994A (en) 2018-11-13 2018-11-13 A kind of multitask method of speech classification based on capsule neural network

Publications (1)

Publication Number Publication Date
CN109523994A true CN109523994A (en) 2019-03-26

Family

ID=65776175

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811346110.8A Pending CN109523994A (en) 2018-11-13 2018-11-13 A kind of multitask method of speech classification based on capsule neural network

Country Status (1)

Country Link
CN (1) CN109523994A (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110120224A (en) * 2019-05-10 2019-08-13 平安科技(深圳)有限公司 Construction method, device, computer equipment and the storage medium of bird sound identification model
CN110428843A (en) * 2019-03-11 2019-11-08 杭州雄迈信息技术有限公司 A kind of voice gender identification deep learning method
CN110931046A (en) * 2019-11-29 2020-03-27 福州大学 Audio high-level semantic feature extraction method and system for overlapped sound event detection
CN110968729A (en) * 2019-11-21 2020-04-07 浙江树人学院(浙江树人大学) Family activity sound event classification method based on additive interval capsule network
CN111179961A (en) * 2020-01-02 2020-05-19 腾讯科技(深圳)有限公司 Audio signal processing method, audio signal processing device, electronic equipment and storage medium
CN111357051A (en) * 2019-12-24 2020-06-30 深圳市优必选科技股份有限公司 Speech emotion recognition method, intelligent device and computer readable storage medium
CN111584010A (en) * 2020-04-01 2020-08-25 昆明理工大学 Key protein identification method based on capsule neural network and ensemble learning
CN111862949A (en) * 2020-07-30 2020-10-30 北京小米松果电子有限公司 Natural language processing method and device, electronic equipment and storage medium
CN112562725A (en) * 2020-12-09 2021-03-26 山西财经大学 Mixed voice emotion classification method based on spectrogram and capsule network
CN112599134A (en) * 2020-12-02 2021-04-02 国网安徽省电力有限公司 Transformer sound event detection method based on voiceprint recognition
CN112992191A (en) * 2021-05-12 2021-06-18 北京世纪好未来教育科技有限公司 Voice endpoint detection method and device, electronic equipment and readable storage medium
CN113314119A (en) * 2021-07-27 2021-08-27 深圳百昱达科技有限公司 Voice recognition intelligent household control method and device
CN113343924A (en) * 2021-07-01 2021-09-03 齐鲁工业大学 Modulation signal identification method based on multi-scale cyclic spectrum feature and self-attention generation countermeasure network
CN113362857A (en) * 2021-06-15 2021-09-07 厦门大学 Real-time speech emotion recognition method based on CapcNN and application device
CN113378984A (en) * 2021-07-05 2021-09-10 国药(武汉)医学实验室有限公司 Medical image classification method, system, terminal and storage medium
CN113782000A (en) * 2021-09-29 2021-12-10 北京中科智加科技有限公司 Language identification method based on multiple tasks
CN115376518A (en) * 2022-10-26 2022-11-22 广州声博士声学技术有限公司 Voiceprint recognition method, system, device and medium for real-time noise big data
US11735168B2 (en) 2020-07-20 2023-08-22 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for recognizing voice
WO2023222088A1 (en) * 2022-05-20 2023-11-23 青岛海尔电冰箱有限公司 Voice recognition and classification method and apparatus
CN117275461A (en) * 2023-11-23 2023-12-22 上海蜜度科技股份有限公司 Multitasking audio processing method, system, storage medium and electronic equipment

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06295196A (en) * 1993-04-08 1994-10-21 Casio Comput Co Ltd Speech recognition device and signal recognition device
WO2005059811A1 (en) * 2003-12-16 2005-06-30 Canon Kabushiki Kaisha Pattern identification method, apparatus, and program
US20160284346A1 (en) * 2015-03-27 2016-09-29 Qualcomm Incorporated Deep neural net based filter prediction for audio event classification and extraction
CN106601235A (en) * 2016-12-02 2017-04-26 厦门理工学院 Semi-supervision multitask characteristic selecting speech recognition method
US20170148431A1 (en) * 2015-11-25 2017-05-25 Baidu Usa Llc End-to-end speech recognition
CN107578775A (en) * 2017-09-07 2018-01-12 四川大学 A kind of multitask method of speech classification based on deep neural network
CN107610692A (en) * 2017-09-22 2018-01-19 杭州电子科技大学 The sound identification method of self-encoding encoder multiple features fusion is stacked based on neutral net
US20180068675A1 (en) * 2016-09-07 2018-03-08 Google Inc. Enhanced multi-channel acoustic models
CN108010514A (en) * 2017-11-20 2018-05-08 四川大学 A kind of method of speech classification based on deep neural network
GB201807225D0 (en) * 2018-03-14 2018-06-13 Papercup Tech Limited A speech processing system and a method of processing a speech signal
CN108766461A (en) * 2018-07-17 2018-11-06 厦门美图之家科技有限公司 Audio feature extraction methods and device

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06295196A (en) * 1993-04-08 1994-10-21 Casio Comput Co Ltd Speech recognition device and signal recognition device
WO2005059811A1 (en) * 2003-12-16 2005-06-30 Canon Kabushiki Kaisha Pattern identification method, apparatus, and program
US20160284346A1 (en) * 2015-03-27 2016-09-29 Qualcomm Incorporated Deep neural net based filter prediction for audio event classification and extraction
US20170148431A1 (en) * 2015-11-25 2017-05-25 Baidu Usa Llc End-to-end speech recognition
US20180068675A1 (en) * 2016-09-07 2018-03-08 Google Inc. Enhanced multi-channel acoustic models
CN106601235A (en) * 2016-12-02 2017-04-26 厦门理工学院 Semi-supervision multitask characteristic selecting speech recognition method
CN107578775A (en) * 2017-09-07 2018-01-12 四川大学 A kind of multitask method of speech classification based on deep neural network
CN107610692A (en) * 2017-09-22 2018-01-19 杭州电子科技大学 The sound identification method of self-encoding encoder multiple features fusion is stacked based on neutral net
CN108010514A (en) * 2017-11-20 2018-05-08 四川大学 A kind of method of speech classification based on deep neural network
GB201807225D0 (en) * 2018-03-14 2018-06-13 Papercup Tech Limited A speech processing system and a method of processing a speech signal
CN108766461A (en) * 2018-07-17 2018-11-06 厦门美图之家科技有限公司 Audio feature extraction methods and device

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
LE,D等: "Discretized continuous speech emotion recognition with multi-task deep recurrent neural network", 《18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH-COMMUNICATION-ASSOCIATION(INTERSPEECH2017)》 *
NAM KYUN KIM等: "Speech emotion recognition based on multi-task learning using a convolutional neural network", 《2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC)》 *
余成波等: "基于胶囊网络的指静脉识别研究", 《电子技术应用》 *
朱应钊等: "胶囊网络技术及发展趋势研究", 《广东通信技术》 *
胡文凭: "基于深层神经网络的口语发音检测与错误分析", 《中国博士学位论文全文数据库信息科技辑》 *
郭俊文: "基于CAPSNET的可穿戴心电采集和心律失常检测***研究", 《中国优秀硕士学位论文全文数据库医药卫生科技辑》 *

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110428843A (en) * 2019-03-11 2019-11-08 杭州雄迈信息技术有限公司 A kind of voice gender identification deep learning method
CN110428843B (en) * 2019-03-11 2021-09-07 杭州巨峰科技有限公司 Voice gender recognition deep learning method
CN110120224A (en) * 2019-05-10 2019-08-13 平安科技(深圳)有限公司 Construction method, device, computer equipment and the storage medium of bird sound identification model
CN110120224B (en) * 2019-05-10 2023-01-20 平安科技(深圳)有限公司 Method and device for constructing bird sound recognition model, computer equipment and storage medium
CN110968729A (en) * 2019-11-21 2020-04-07 浙江树人学院(浙江树人大学) Family activity sound event classification method based on additive interval capsule network
CN110968729B (en) * 2019-11-21 2022-05-17 浙江树人学院(浙江树人大学) Family activity sound event classification method based on additive interval capsule network
CN110931046A (en) * 2019-11-29 2020-03-27 福州大学 Audio high-level semantic feature extraction method and system for overlapped sound event detection
WO2021127982A1 (en) * 2019-12-24 2021-07-01 深圳市优必选科技股份有限公司 Speech emotion recognition method, smart device, and computer-readable storage medium
CN111357051A (en) * 2019-12-24 2020-06-30 深圳市优必选科技股份有限公司 Speech emotion recognition method, intelligent device and computer readable storage medium
CN111357051B (en) * 2019-12-24 2024-02-02 深圳市优必选科技股份有限公司 Speech emotion recognition method, intelligent device and computer readable storage medium
CN111179961A (en) * 2020-01-02 2020-05-19 腾讯科技(深圳)有限公司 Audio signal processing method, audio signal processing device, electronic equipment and storage medium
CN111584010A (en) * 2020-04-01 2020-08-25 昆明理工大学 Key protein identification method based on capsule neural network and ensemble learning
CN111584010B (en) * 2020-04-01 2022-05-27 昆明理工大学 Key protein identification method based on capsule neural network and ensemble learning
US11735168B2 (en) 2020-07-20 2023-08-22 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for recognizing voice
CN111862949A (en) * 2020-07-30 2020-10-30 北京小米松果电子有限公司 Natural language processing method and device, electronic equipment and storage medium
CN111862949B (en) * 2020-07-30 2024-04-02 北京小米松果电子有限公司 Natural language processing method and device, electronic equipment and storage medium
CN112599134A (en) * 2020-12-02 2021-04-02 国网安徽省电力有限公司 Transformer sound event detection method based on voiceprint recognition
CN112562725A (en) * 2020-12-09 2021-03-26 山西财经大学 Mixed voice emotion classification method based on spectrogram and capsule network
CN112992191B (en) * 2021-05-12 2021-11-05 北京世纪好未来教育科技有限公司 Voice endpoint detection method and device, electronic equipment and readable storage medium
CN112992191A (en) * 2021-05-12 2021-06-18 北京世纪好未来教育科技有限公司 Voice endpoint detection method and device, electronic equipment and readable storage medium
CN113362857A (en) * 2021-06-15 2021-09-07 厦门大学 Real-time speech emotion recognition method based on CapcNN and application device
CN113343924A (en) * 2021-07-01 2021-09-03 齐鲁工业大学 Modulation signal identification method based on multi-scale cyclic spectrum feature and self-attention generation countermeasure network
CN113378984A (en) * 2021-07-05 2021-09-10 国药(武汉)医学实验室有限公司 Medical image classification method, system, terminal and storage medium
CN113378984B (en) * 2021-07-05 2023-05-02 国药(武汉)医学实验室有限公司 Medical image classification method, system, terminal and storage medium
CN113314119B (en) * 2021-07-27 2021-12-03 深圳百昱达科技有限公司 Voice recognition intelligent household control method and device
CN113314119A (en) * 2021-07-27 2021-08-27 深圳百昱达科技有限公司 Voice recognition intelligent household control method and device
CN113782000A (en) * 2021-09-29 2021-12-10 北京中科智加科技有限公司 Language identification method based on multiple tasks
WO2023222088A1 (en) * 2022-05-20 2023-11-23 青岛海尔电冰箱有限公司 Voice recognition and classification method and apparatus
CN115376518A (en) * 2022-10-26 2022-11-22 广州声博士声学技术有限公司 Voiceprint recognition method, system, device and medium for real-time noise big data
CN117275461A (en) * 2023-11-23 2023-12-22 上海蜜度科技股份有限公司 Multitasking audio processing method, system, storage medium and electronic equipment
CN117275461B (en) * 2023-11-23 2024-03-15 上海蜜度科技股份有限公司 Multitasking audio processing method, system, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
CN109523994A (en) A kind of multitask method of speech classification based on capsule neural network
Kim et al. Towards speech emotion recognition" in the wild" using aggregated corpora and deep multi-task learning
CN106228977B (en) Multi-mode fusion song emotion recognition method based on deep learning
CN110534132A (en) A kind of speech-emotion recognition method of the parallel-convolution Recognition with Recurrent Neural Network based on chromatogram characteristic
CN108597539A (en) Speech-emotion recognition method based on parameter migration and sound spectrograph
CN110289003A (en) A kind of method of Application on Voiceprint Recognition, the method for model training and server
CN106503805A (en) A kind of bimodal based on machine learning everybody talk with sentiment analysis system and method
CN109241524A (en) Semantic analysis method and device, computer readable storage medium, electronic equipment
CN109285562A (en) Speech-emotion recognition method based on attention mechanism
CN108763326A (en) A kind of sentiment analysis model building method of the diversified convolutional neural networks of feature based
CN104538027B (en) The mood of voice social media propagates quantization method and system
Shahriar et al. Classifying maqams of Qur’anic recitations using deep learning
Chen et al. Distilled binary neural network for monaural speech separation
CN109767789A (en) A kind of new feature extracting method for speech emotion recognition
CN111899766B (en) Speech emotion recognition method based on optimization fusion of depth features and acoustic features
Cardona et al. Online phoneme recognition using multi-layer perceptron networks combined with recurrent non-linear autoregressive neural networks with exogenous inputs
Cao et al. Speaker-independent speech emotion recognition based on random forest feature selection algorithm
CN113077823A (en) Subdomain self-adaptive cross-library speech emotion recognition method based on depth self-encoder
Bergler et al. Deep representation learning for orca call type classification
CN115393933A (en) Video face emotion recognition method based on frame attention mechanism
CN112487237A (en) Music classification method based on self-adaptive CNN and semi-supervised self-training model
Chen et al. Construction of affective education in mobile learning: The study based on learner’s interest and emotion recognition
CN110532380A (en) A kind of text sentiment classification method based on memory network
Singh et al. Speaker Recognition Assessment in a Continuous System for Speaker Identification
Tashakori et al. Designing the Intelligent System Detecting a Sense of Wonder in English Speech Signal Using Fuzzy-Nervous Inference-Adaptive system (ANFIS)

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20190326