CN110148408A - A kind of Chinese speech recognition method based on depth residual error - Google Patents
A kind of Chinese speech recognition method based on depth residual error Download PDFInfo
- Publication number
- CN110148408A CN110148408A CN201910458947.XA CN201910458947A CN110148408A CN 110148408 A CN110148408 A CN 110148408A CN 201910458947 A CN201910458947 A CN 201910458947A CN 110148408 A CN110148408 A CN 110148408A
- Authority
- CN
- China
- Prior art keywords
- layer
- residual error
- characteristic parameter
- speech recognition
- depth residual
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 28
- 238000012549 training Methods 0.000 claims abstract description 16
- 238000001228 spectrum Methods 0.000 claims description 22
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 17
- 239000000284 extract Substances 0.000 claims description 5
- 238000013507 mapping Methods 0.000 claims description 5
- 230000008569 process Effects 0.000 claims description 5
- 230000001537 neural effect Effects 0.000 claims description 4
- 238000009432 framing Methods 0.000 claims description 3
- 239000000203 mixture Substances 0.000 claims description 3
- 238000013527 convolutional neural network Methods 0.000 abstract description 8
- 239000013598 vector Substances 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000006403 short-term memory Effects 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 241000208340 Araliaceae Species 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000002996 emotional effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000003475 lamination Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000015654 memory Effects 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 238000002203 pretreatment Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/45—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Image Analysis (AREA)
Abstract
The present invention relates to a kind of Chinese speech recognition method based on depth residual error, this method includes the following steps: 1) to obtain the initial data containing voice messaging;2) MFCC characteristic parameter is extracted to initial data, and obtains the first-order difference and second differnce of MFCC characteristic parameter;3) first-order difference and second differnce of present frame and the frame are spliced, obtains last characteristic parameter, and the two-dimensional array of this feature parameter is converted into three-dimensional array;4) the last characteristic parameter of three-dimensional array in step 3) is all input into convolutional neural networks, to convolutional neural networks repetition training, until obtaining satisfactory discrimination;5) trained convolutional neural networks model is tested, output identification text.Compared with prior art, the present invention has many advantages, such as to accelerate model training speed, improves phonetic recognization rate.
Description
Technical field
The present invention relates to Speech processing and identification fields, more particularly, to a kind of Chinese speech based on depth residual error
Recognition methods.
Background technique
For voice as a kind of most convenient natural form of communication, it carries the function of information transmitting and emotional expression.With
The progress of speech recognition technology, more and more people be desirable to link up by voice and machine, therefore voice know
This other technology also more and more attention has been paid to.Most widely a kind of structure is long memory network in short-term to speech recognition application at present,
This network can to voice it is long when correlation model, to improve recognition correct rate.And two-way LSTM network can be with
Better performance is obtained, but problem high there is also training complexity height, decoding delay simultaneously.
Summary of the invention
It is an object of the present invention to overcome the above-mentioned drawbacks of the prior art and provide one kind to be based on depth residual error
Chinese speech recognition method.
The purpose of the present invention can be achieved through the following technical solutions:
A kind of Chinese speech recognition method based on depth residual error, includes the following steps:
Step (1) obtains the initial data containing voice messaging.
Step (2) extracts MFCC characteristic parameter to initial data, and obtains the first-order difference and second order of MFCC characteristic parameter
Difference.
It extracts MFCC characteristic parameter and specifically includes the following steps:
21) preemphasis, framing and adding window is carried out to voice to pre-process;
22) to each short-time analysis window, corresponding frequency spectrum is obtained by FFT;
23) frequency spectrum that step 22) obtains is obtained into Mel frequency spectrum by Mel filter group, it, will be linear by Mel frequency spectrum
Natural frequency spectrum is converted to the Mel frequency spectrum for embodying human auditory system;
24) cepstral analysis is carried out on Mel frequency spectrum, obtains Mel frequency cepstral coefficient MFCC, using MFCC as voice spy
Sign.
The first-order difference of MFCC characteristic parameter is the difference of two frame of continuous adjacent in discrete function, expression formula are as follows:
Y (k)=X (k+1)-X (k)
In formula, k is frame number, and X (k) is the MFCC characteristic parameter of kth frame, and X (k+1) is the MFCC characteristic parameter of+1 frame of kth.
Second differnce indicates the relationship between the first-order difference of+1 frame of kth and the first-order difference of kth frame, the table of second differnce
Up to formula are as follows:
Z (k)=Y (k+1)-Y (k)=X (k+2) -2*X (k+1)+X (k)
Step (3) splices the first-order difference and second differnce of present frame and the frame, obtains last feature ginseng
Number, and a channel will be increased on the two-dimensional array of this feature parameter, obtain the last characteristic parameter of three-dimensional array.
Residual block include two layers of convolutional layer and one layer of random deactivating layer, the output of the random deactivating layer directly with process
Input after one layer of convolution is added, and obtains final target mapping.The structure of the depth residual error network includes multilayer convolution
Layer, four residual blocks, two layers of pond layer, two layers of full articulamentum and softmax layers of composition, the full articulamentum of first layer are equipped with 512
A neural unit, the full articulamentum of the second layer are equipped with 1422 neural units, and the convolution kernels of all convolutional layers is 3x3, first layer, the
The number of the convolution kernel of two layers and first residual blocks is 32, and the step-length of first layer pond layer is 2x2, third layer convolutional layer and
The format of the convolution kernel of second residual block is 64, and the convolution kernel number of the 4th layer of convolutional layer and third residual block is 128, the
The convolution kernel number of five layers of convolutional layer and the 4th residual block is 256, and the step-length of second layer pond layer is 1x2, the last layer volume
The number of product core is 512.
Preferably, the size of the convolution kernel in the residual error block structure is 3x3, and the parameter of random deactivating layer is set as
0.2, random deactivating layer selectively responds input.
The last characteristic parameter of three-dimensional array in step (3) is all input into depth residual error network by step (4), right
Depth residual error network repetition training, until obtaining satisfactory discrimination, the discrimination is that the phoneme of speech recognition misses
Code rate.
Preferably, if training pattern reaches the 15.42% phoneme bit error rate, it is determined as that the result of model training reaches symbol
Close desired discrimination.
Step (5) tests trained depth residual error network model, output identification text.
Trained model is tested, method when by voice to be tested according to training carries out feature extraction, mentions
The characteristic parameter got inputs in trained model, and the output of model is the text recognized.
Compared with prior art, the invention has the following advantages that
1) residual error block structure is applied in convolutional neural networks by the method for the present invention using depth residual error network,
Convolutional neural networks generally comprise convolutional layer, pond layer and full articulamentum, and the input of convolutional layer is characterized parameter, and convolution kernel is to set
The step-length set is slided, and different local feature in learning characteristic figure, convolutional layer is more, and the feature of extraction is more, Chi Hua
Layer mainly compresses characteristic parameter, calculates the average value or maximum value in each region, carries out dimensionality reduction to feature, reduces mould
The number of network node in type, full articulamentum have the function of classifier, which is mapped to the characteristic parameter learnt
Sample labeling space, carries out classification and matching, and predicted input signal generic, therefore, convolutional neural networks share the spy of weight
Point can greatly reduce the parameter of model, accelerate the training speed of model, and then solve the problems, such as that decoding delay is high;
2) residual error structure is applied in convolutional neural networks by the present invention, and convolutional neural networks directly learn to input data into
The target of output label maps, it may appear that and after the number of plies of neural network is deepened, training precision does not rise the problems such as declining instead, but
It is this phenomenon is not as caused by over-fitting, simple network of deepening can make network itself be difficult to train, and residual error net
Residual error amount is added with former input quantity by the residual error amount of learning objective mapping and former input, obtains final target mapping by network,
This study mechanism can effectively solve the problems such as network performance is degenerated, and while deepening network depth, alleviate over-fitting
Problem improves the discrimination of voice.
Detailed description of the invention
Residual error block structural diagram in Fig. 1 present invention;
Fig. 2 is the flow diagram of the method for the present invention;
Fig. 3 is the broad flow diagram for extracting MFCC feature;
Fig. 4 is depth residual error network general construction schematic diagram.
Specific embodiment
The present invention is described in detail with specific embodiment below in conjunction with the accompanying drawings.Obviously, described embodiment is this
A part of the embodiment of invention, rather than whole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art exist
Every other embodiment obtained under the premise of creative work is not made, all should belong to the scope of protection of the invention.
The present invention relates to a kind of Chinese speech recognition methods based on depth residual error, including the following steps:
Step 1: obtaining the initial data containing voice messaging.
Step 2: extracting MFCC characteristic parameter to initial data.
MFCC (Mel Frequency Cepstral Coefficents) spy is extracted by one group of Mel filter to voice
Levy parameter, i.e. mel-frequency cepstrum coefficient (500,13), wherein extract mainly comprising the processes of for MFCC feature
1) preemphasis, framing and adding window first are carried out to voice, is used for enhanced speech signal performance (signal-to-noise ratio, processing accuracy
Deng) some pretreatments.
2) to each short-time analysis window, corresponding frequency spectrum is obtained by FFT, it is different on a timeline for obtaining distribution
Frequency spectrum in time window.
3) frequency spectrum above is obtained into Mel frequency spectrum by Mel filter group, by Mel frequency spectrum, by linear natural frequency spectrum
Be converted to the Mel frequency spectrum for embodying human auditory system.
4) carried out on Mel frequency spectrum cepstral analysis (take logarithm, do inverse transformation, practical inverse transformation generally by DCT from
Scattered cosine transform realizes that the 2nd after taking DCT be to the 13rd coefficient as MFCC coefficient), obtain Mel frequency cepstral coefficient
MFCC, this MFCC are exactly this frame phonetic feature.
At this time, voice can be described by a series of cepstrum vector, and each vector is exactly the MFCC of every frame
Feature vector.Speech classifier can be trained and be identified by these cepstrum vectors after obtaining MFCC feature vector
?.
However, MFCC is the static nature of voice, to extract the behavioral characteristics of voice, then single order and two scales are sought
Point.First-order difference is exactly the difference of two frame of continuous adjacent in discrete function, is defined as follows formula:
Y (k)=X (k+1)-X (k)
In formula, k is frame number, and X (k) is the MFCC characteristic parameter of kth frame, and X (k+1) is the MFCC characteristic parameter of+1 frame of kth.
Second differnce indicates that the relationship between the first-order difference of+1 frame of kth and the first-order difference of kth frame, second differnce are determined
Justice such as following formula:
Z (k)=Y (k+1)-Y (k)=X (k+2) -2*X (k+1)+X (k)
Step 3: the first-order difference of present frame and the frame and second differnce are spliced, last characteristic parameter be (500,
39), increase a channel on the two-dimensional array, which is converted into three-dimensional array (500,39,1).
Step 4: the calculated characteristic parameter after is all input into depth residual error network, to depth residual error network
Repetition training reduces the loss of neural network by backpropagation, until obtaining preferable discrimination.
Residual error block structure is by two layers of convolutional layer, and one layer of random deactivating layer is constituted.The output of random deactivating layer directly with warp
Input after crossing one layer of convolution is added, and obtains final target mapping.The size of convolution kernel is 3x3 in the residual error block structure, with
The parameter of machine deactivating layer is set as 0.2, and random deactivating layer selectively responds input, and study precision can be improved.
The structure of depth residual error network of the present invention is by the full connection of multilayer convolutional layer, 4 residual blocks, two layers of pond layer and two layers
Layer and softmax layers of composition, the full articulamentum of first layer have 512 neural units, and the full articulamentum of the second layer has 1422 nerves
Unit.The convolution kernel of all convolutional layers is 3x3, and the number of the convolution kernel of first layer, the second layer and first residual block is 32,
The step-length of first layer pond layer is 2x2, and the format of the convolution kernel of third layer convolutional layer and second residual block is 64, the 4th layer of volume
The convolution kernel number of lamination and third residual block is 128, and the convolution kernel number of layer 5 convolutional layer and the 4th residual block is
256, the step-length of second layer pond layer is 1x2, and the number of the last layer convolution kernel is 512.List entries is by neural network
(x1,x2,...,xT) characteristic parameter passing through a series of convolutional layers, pond layer after full articulamentum and softmax layers, converts
For output sequence (y1,y2,...,yT), CTC (Connectionist Temporal Classification, connection timing point
Class technology) according to (y1,y2,...,yT) calculate the posterior probability p (l of actual sequence1,l2,...,lm|x1,x2,...xT), mind
Process through network training is exactly in the case where given input and practical aligned phoneme sequence, and adjustment neural network parameter to train
Sample set p (l1,l2,...,lm|x1,x2,...xT) maximum, i.e. CTC decoding is exactly to find posteriority under conditions of given input
The sequence of maximum probabilityWherein, l1,l2,...,lmFor sequence label, T is frame number, and m is the number of label.
Discrimination is the phoneme bit error rate of speech recognition, by test of many times, depth residual error network loss hardly
When decline, i.e., model reaches the 15.42% phoneme bit error rate, is determined as that the result of model training reaches the discrimination met.
The present embodiment is based on THCHS30 Chinese data collection and carries out actual experiment, relative to BLSTM traditional in speech recognition
(bidirectional long short-term memory, two-way long short-term memory) frame, using the method for the present invention training
When convergent speed ratio BLSTM network it is upper 3 times fast, the discrimination of voice improves 3%.
Step 5: testing trained model, method when by voice to be tested according to training carries out feature
It extracts, the characteristic parameter extracted inputs in trained model, and the output of model is the text recognized.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any
The staff for being familiar with the art in the technical scope disclosed by the present invention, can readily occur in various equivalent modifications or replace
It changes, these modifications or substitutions should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with right
It is required that protection scope subject to.
Claims (8)
1. a kind of Chinese speech recognition method based on depth residual error, which is characterized in that this method includes the following steps:
1) initial data containing voice messaging is obtained;
2) MFCC characteristic parameter is extracted to initial data, and obtains the first-order difference and second differnce of MFCC characteristic parameter;
3) first-order difference and second differnce of present frame and the frame are spliced, obtains last characteristic parameter, and by the spy
It levies and increases a channel on the two-dimensional array of parameter, obtain the last characteristic parameter of three-dimensional array;
4) the last characteristic parameter of three-dimensional array in step 3) is all input into depth residual error network, to depth residual error network
Repetition training, until obtaining satisfactory discrimination;
5) trained depth residual error network model is tested, output identification text.
2. a kind of Chinese speech recognition method based on depth residual error according to claim 1, which is characterized in that step 2)
In, it extracts MFCC characteristic parameter and specifically includes the following steps:
21) preemphasis, framing and adding window is carried out to voice to pre-process;
22) to each short-time analysis window, corresponding frequency spectrum is obtained by FFT;
23) frequency spectrum that step 22) obtains is obtained into Mel frequency spectrum by Mel filter group, by Mel frequency spectrum, by linear nature
Frequency spectrum is converted to the Mel frequency spectrum for embodying human auditory system;
24) cepstral analysis is carried out on Mel frequency spectrum, Mel frequency cepstral coefficient MFCC is obtained, using MFCC as phonetic feature.
3. a kind of Chinese speech recognition method based on depth residual error according to claim 2, which is characterized in that step 2)
In, the first-order difference of MFCC characteristic parameter is the difference of two frame of continuous adjacent in discrete function, expression formula are as follows:
Y (k)=X (k+1)-X (k)
In formula, k is frame number, and X (k) is the MFCC characteristic parameter of kth frame, and X (k+1) is the MFCC characteristic parameter of+1 frame of kth.
4. a kind of Chinese speech recognition method based on depth residual error according to claim 3, which is characterized in that step 2)
In, second differnce indicates the relationship between the first-order difference of+1 frame of kth and the first-order difference of kth frame, the expression formula of second differnce
Are as follows:
Z (k)=Y (k+1)-Y (k)=X (k+2) -2*X (k+1)+X (k).
5. a kind of Chinese speech recognition method based on depth residual error according to claim 1, which is characterized in that step 3)
In, the structure of the depth residual error network includes multilayer convolutional layer, four residual blocks, two layers of pond layer, two layers of full articulamentum
And softmax layers of composition, the full articulamentum of first layer are equipped with 512 neural units, the full articulamentum of the second layer is equipped with 1422 minds
Through unit, the convolution kernel of all convolutional layers is 3x3, and the number of the convolution kernel of first layer, the second layer and first residual block is
32, the step-length of first layer pond layer is 2x2, and the format of the convolution kernel of third layer convolutional layer and second residual block is the 64, the 4th
The convolution kernel number of layer convolutional layer and third residual block is 128, the convolution kernel of layer 5 convolutional layer and the 4th residual block
Number is 256, and the step-length of second layer pond layer is 1x2, and the number of the last layer convolution kernel is 512.
6. a kind of Chinese speech recognition method based on depth residual error according to claim 5, which is characterized in that step 3)
In, the residual block include two layers of convolutional layer and one layer of random deactivating layer, the output of the random deactivating layer directly with warp
Input after crossing one layer of convolution is added, and obtains final target mapping.
7. a kind of Chinese speech recognition method based on depth residual error according to claim 6, which is characterized in that described
The size of convolution kernel in residual error block structure is 3x3, and the parameter of random deactivating layer is set as 0.2, and random deactivating layer is selectively
Input is responded.
8. a kind of Chinese speech recognition method based on depth residual error according to claim 1, which is characterized in that described
Discrimination is the phoneme bit error rate of speech recognition, if training pattern reaches the 15.42% phoneme bit error rate, is determined as that model is instructed
Experienced result reaches satisfactory discrimination.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910458947.XA CN110148408A (en) | 2019-05-29 | 2019-05-29 | A kind of Chinese speech recognition method based on depth residual error |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910458947.XA CN110148408A (en) | 2019-05-29 | 2019-05-29 | A kind of Chinese speech recognition method based on depth residual error |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110148408A true CN110148408A (en) | 2019-08-20 |
Family
ID=67592187
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910458947.XA Pending CN110148408A (en) | 2019-05-29 | 2019-05-29 | A kind of Chinese speech recognition method based on depth residual error |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110148408A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110909601A (en) * | 2019-10-18 | 2020-03-24 | 武汉虹识技术有限公司 | Beautiful pupil identification method and system based on deep learning |
CN111276125A (en) * | 2020-02-11 | 2020-06-12 | 华南师范大学 | Lightweight speech keyword recognition method facing edge calculation |
CN111401530A (en) * | 2020-04-22 | 2020-07-10 | 上海依图网络科技有限公司 | Recurrent neural network and training method thereof |
CN111402901A (en) * | 2020-03-27 | 2020-07-10 | 广东外语外贸大学 | CNN voiceprint recognition method and system based on RGB mapping characteristics of color image |
CN111798875A (en) * | 2020-07-21 | 2020-10-20 | 杭州芯声智能科技有限公司 | VAD implementation method based on three-value quantization compression |
CN111833886A (en) * | 2020-07-27 | 2020-10-27 | 中国科学院声学研究所 | Fully-connected multi-scale residual error network and voiceprint recognition method thereof |
CN112614483A (en) * | 2019-09-18 | 2021-04-06 | 珠海格力电器股份有限公司 | Modeling method based on residual convolutional network, voice recognition method and electronic equipment |
CN112951277A (en) * | 2019-11-26 | 2021-06-11 | 新东方教育科技集团有限公司 | Method and device for evaluating speech |
CN113361647A (en) * | 2021-07-06 | 2021-09-07 | 青岛洞听智能科技有限公司 | Method for identifying type of missed call |
WO2022237053A1 (en) * | 2021-05-11 | 2022-11-17 | Huawei Technologies Co.,Ltd. | Methods and systems for computing output of neural network layer |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108830287A (en) * | 2018-04-18 | 2018-11-16 | 哈尔滨理工大学 | The Chinese image, semantic of Inception network integration multilayer GRU based on residual error connection describes method |
CN108847223A (en) * | 2018-06-20 | 2018-11-20 | 陕西科技大学 | A kind of audio recognition method based on depth residual error neural network |
CN109272990A (en) * | 2018-09-25 | 2019-01-25 | 江南大学 | Audio recognition method based on convolutional neural networks |
CN109272988A (en) * | 2018-09-30 | 2019-01-25 | 江南大学 | Audio recognition method based on multichannel convolutional neural networks |
CN109460774A (en) * | 2018-09-18 | 2019-03-12 | 华中科技大学 | A kind of birds recognition methods based on improved convolutional neural networks |
US20190130896A1 (en) * | 2017-10-26 | 2019-05-02 | Salesforce.Com, Inc. | Regularization Techniques for End-To-End Speech Recognition |
CN109767759A (en) * | 2019-02-14 | 2019-05-17 | 重庆邮电大学 | End-to-end speech recognition methods based on modified CLDNN structure |
-
2019
- 2019-05-29 CN CN201910458947.XA patent/CN110148408A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190130896A1 (en) * | 2017-10-26 | 2019-05-02 | Salesforce.Com, Inc. | Regularization Techniques for End-To-End Speech Recognition |
CN108830287A (en) * | 2018-04-18 | 2018-11-16 | 哈尔滨理工大学 | The Chinese image, semantic of Inception network integration multilayer GRU based on residual error connection describes method |
CN108847223A (en) * | 2018-06-20 | 2018-11-20 | 陕西科技大学 | A kind of audio recognition method based on depth residual error neural network |
CN109460774A (en) * | 2018-09-18 | 2019-03-12 | 华中科技大学 | A kind of birds recognition methods based on improved convolutional neural networks |
CN109272990A (en) * | 2018-09-25 | 2019-01-25 | 江南大学 | Audio recognition method based on convolutional neural networks |
CN109272988A (en) * | 2018-09-30 | 2019-01-25 | 江南大学 | Audio recognition method based on multichannel convolutional neural networks |
CN109767759A (en) * | 2019-02-14 | 2019-05-17 | 重庆邮电大学 | End-to-end speech recognition methods based on modified CLDNN structure |
Non-Patent Citations (1)
Title |
---|
JIAN GUO: ""depth dropout :efficient training of residual convolutinal neural networks"", 《INTERNATIONAL CONFERENCE ON DIGITAL IMAGE COMPUTING:TECHNIQUES & APPLICATIONS》 * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112614483A (en) * | 2019-09-18 | 2021-04-06 | 珠海格力电器股份有限公司 | Modeling method based on residual convolutional network, voice recognition method and electronic equipment |
CN110909601A (en) * | 2019-10-18 | 2020-03-24 | 武汉虹识技术有限公司 | Beautiful pupil identification method and system based on deep learning |
CN110909601B (en) * | 2019-10-18 | 2022-12-09 | 武汉虹识技术有限公司 | Beautiful pupil identification method and system based on deep learning |
CN112951277A (en) * | 2019-11-26 | 2021-06-11 | 新东方教育科技集团有限公司 | Method and device for evaluating speech |
CN112951277B (en) * | 2019-11-26 | 2023-01-13 | 新东方教育科技集团有限公司 | Method and device for evaluating speech |
CN111276125A (en) * | 2020-02-11 | 2020-06-12 | 华南师范大学 | Lightweight speech keyword recognition method facing edge calculation |
CN111276125B (en) * | 2020-02-11 | 2023-04-07 | 华南师范大学 | Lightweight speech keyword recognition method facing edge calculation |
CN111402901A (en) * | 2020-03-27 | 2020-07-10 | 广东外语外贸大学 | CNN voiceprint recognition method and system based on RGB mapping characteristics of color image |
CN111402901B (en) * | 2020-03-27 | 2023-04-18 | 广东外语外贸大学 | CNN voiceprint recognition method and system based on RGB mapping characteristics of color image |
CN111401530A (en) * | 2020-04-22 | 2020-07-10 | 上海依图网络科技有限公司 | Recurrent neural network and training method thereof |
CN111798875A (en) * | 2020-07-21 | 2020-10-20 | 杭州芯声智能科技有限公司 | VAD implementation method based on three-value quantization compression |
CN111833886A (en) * | 2020-07-27 | 2020-10-27 | 中国科学院声学研究所 | Fully-connected multi-scale residual error network and voiceprint recognition method thereof |
WO2022237053A1 (en) * | 2021-05-11 | 2022-11-17 | Huawei Technologies Co.,Ltd. | Methods and systems for computing output of neural network layer |
CN113361647A (en) * | 2021-07-06 | 2021-09-07 | 青岛洞听智能科技有限公司 | Method for identifying type of missed call |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110148408A (en) | A kind of Chinese speech recognition method based on depth residual error | |
CN110827801B (en) | Automatic voice recognition method and system based on artificial intelligence | |
CN107818164A (en) | A kind of intelligent answer method and its system | |
CN110459225B (en) | Speaker recognition system based on CNN fusion characteristics | |
CN108922513A (en) | Speech differentiation method, apparatus, computer equipment and storage medium | |
CN107221320A (en) | Train method, device, equipment and the computer-readable storage medium of acoustic feature extraction model | |
CN106952643A (en) | A kind of sound pick-up outfit clustering method based on Gaussian mean super vector and spectral clustering | |
CN108766419A (en) | A kind of abnormal speech detection method based on deep learning | |
CN109119072A (en) | Civil aviaton's land sky call acoustic model construction method based on DNN-HMM | |
CN111243602A (en) | Voiceprint recognition method based on gender, nationality and emotional information | |
CN102568476B (en) | Voice conversion method based on self-organizing feature map network cluster and radial basis network | |
CN110111797A (en) | Method for distinguishing speek person based on Gauss super vector and deep neural network | |
CN109272988A (en) | Audio recognition method based on multichannel convolutional neural networks | |
CN111724770B (en) | Audio keyword identification method for generating confrontation network based on deep convolution | |
CN108986798B (en) | Processing method, device and the equipment of voice data | |
CN113539232B (en) | Voice synthesis method based on lesson-admiring voice data set | |
CN113111786B (en) | Underwater target identification method based on small sample training diagram convolutional network | |
CN109192192A (en) | A kind of Language Identification, device, translator, medium and equipment | |
CN109671423A (en) | Non-parallel text compressing method under the limited situation of training data | |
CN107293290A (en) | The method and apparatus for setting up Speech acoustics model | |
CN110473571A (en) | Emotion identification method and device based on short video speech | |
CN109036470A (en) | Speech differentiation method, apparatus, computer equipment and storage medium | |
CN111341294A (en) | Method for converting text into voice with specified style | |
CN106297769B (en) | A kind of distinctive feature extracting method applied to languages identification | |
CN115393933A (en) | Video face emotion recognition method based on frame attention mechanism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190820 |
|
RJ01 | Rejection of invention patent application after publication |