CN108269568A - A kind of acoustic training model method based on CTC - Google Patents

A kind of acoustic training model method based on CTC Download PDF

Info

Publication number
CN108269568A
CN108269568A CN201710002096.9A CN201710002096A CN108269568A CN 108269568 A CN108269568 A CN 108269568A CN 201710002096 A CN201710002096 A CN 201710002096A CN 108269568 A CN108269568 A CN 108269568A
Authority
CN
China
Prior art keywords
phoneme
ctc
time
blank
symbol
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710002096.9A
Other languages
Chinese (zh)
Other versions
CN108269568B (en
Inventor
张鹏远
王智超
潘接林
颜永红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Acoustics CAS
Beijing Kexin Technology Co Ltd
Original Assignee
Institute of Acoustics CAS
Beijing Kexin Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Acoustics CAS, Beijing Kexin Technology Co Ltd filed Critical Institute of Acoustics CAS
Priority to CN201710002096.9A priority Critical patent/CN108269568B/en
Publication of CN108269568A publication Critical patent/CN108269568A/en
Application granted granted Critical
Publication of CN108269568B publication Critical patent/CN108269568B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • G10L15/144Training of HMMs
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • G10L2015/025Phonemes, fenemes or fenones being the recognition units

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Auxiliary Devices For Music (AREA)

Abstract

The present invention provides a kind of method of the acoustic training model based on CTC, and this method includes:One step 1, training initial GMM model carry out the text marking of training data with the GMM model time point pressure alignment, obtain the time zone corresponding to each phoneme;Step 2 is inserted into one and relevant " blank " symbol of the phoneme after each phoneme, and each phoneme is gathered around there are one distinctive " blank " symbol;Step 3, using finite state machine, to adding in the searching route figure of backcasting before one CTC of phoneme notation sequence construct after " blank " symbol;Step 4, according to time unifying as a result, the time range occurred to each phoneme limits, and carries out beta pruning to the searching route figure, phoneme position beyond the path of time restriction is cut, final CTC is obtained and calculates searching route figure required during network error;Step 5 carries out acoustic training model using Delayed Neural Networks (Time delay Neural Network, TDNN) structure combination CTC methods, obtains final TDNN CTC acoustic models.

Description

A kind of acoustic training model method based on CTC
Technical field
The present invention relates to technical field of voice recognition, the method for more particularly to a kind of acoustic training model based on CTC.
Background technology
In recent years, it in speech recognition system, introduces deep neural network (Deep Neural Network, DNN) and carries out Acoustic model modeling has been achieved for huge success.Due to the outstanding classification capacities of DNN, the hidden Ma Er of tradition can be replaced Mixed Gauss model (Gaussian Mixture Model, GMM) that can be in husband's model (Hidden Markov Model) framework For generating posterior probability.However, this new HMM/DNN model frameworks training get up it is extremely complex.Therefore, researchers Start to explore a kind of learning method end to end, that is, input a phonetic feature sequence, directly obtain its text sequence.This In the case of, connection sequential sorting criterion (Connectionist Temporal Classification, CTC) combines cycle god Method (Recurrent Neural Network, RNN) through network is increasingly paid close attention to by researcher.
CTC mainly has two aspects with traditional method using cross entropy (Cross-entropy, CE) training neural network Difference:First, an additional output node is in the output for being added to network, for representing " blank " symbol.In language In sound identification, each output node of neural network represents an Acoustic Modeling factor, can be with according to the difference of modeling granularity It is single-tone prime factor or the triphones factor.And the output of each moment network then represent the moment each phoneme because The posterior probability of son.The reason of adding in " blank " symbol be with its represent network output it is uncertain when state, i.e., ought input and be Noise is characterized when not cognizable feature or input are between the critical condition of two different phonemes, network is exportable " empty In vain " symbol and avoid output one determining phoneme;Second, CTC training method are that the whole word of network inputs is optimized, Purpose is to maximize the output probability of the whole correct text sequence of sentence, and the output that each frame is not maximized as cross entropy is general Rate, CTC in the output matrix of network, find all paths that may be mapped to correct text sequence by preceding backward algorithm, Calculate its probability and, and then calculate network error, by error back propagation and gradient descent algorithm come update neural network join Number.
The calculation formula of the loss function of CTC is:
Wherein, S represents training dataset;X represents input feature vector;Z represents to map to the path set of correct text sequence It closes;L (S) represents the error between network output and mark;Lnp (z | x) represents the natural logrithm of likelihood probability;L (x, z) is represented The error of individualized training sample.
Likelihood probability p (z | x) it can be calculated by preceding backward algorithm:
Wherein, | U ' | it is the annotated sequence length added in after " blank " symbol, α (t, u) is the forward calculation factor, β (t, u) For the backcasting factor, α (t, u) β (t, u) represents to pass through the probability of phoneme u in t moment in z.
Finally, the error of network can be represented by the following formula:
Wherein,Represent the value of k-th of output node of t moment network,It is thenBefore by activation primitive Network output valve,Represent that node k appears in the location sets in U '.
It is front and rear to map to correct text sequence by all into search process in the training process of CTC models Path is included, the position and actual conditions that extremely asymmetric path more wherein included, i.e. those phonemes occur Compared to the path for having significant delays or shifting to an earlier date, and these paths can lead to the unstable of model training.In addition, traditional CTC Model framework is trained using RNN, modeling ability when RNN has long, is greatly improved the model performance of CTC, but by It is caused to be not easy parallelization training in some characteristics of RNN, training speed is very slow, and training effectiveness is low.
Although the training step of CTC models simplifies, its accuracy of identification does not have competitiveness compared with CE models; It is but declined slightly compared to traditional cross entropy (Cross-entropy, CE) method on accuracy of identification, accuracy of identification is relatively low;Especially Hydraulic performance decline on middle-size and small-size data set is more serious, and the performance of CTC acoustic models is usually not so good as CE models.In addition, CTC moulds The training of type is extremely unstable, easily dissipates.
Invention content
It is an object of the present invention to for training method of the solution to existing acoustic model, there are the above problem, the present invention A kind of method of the acoustic training model based on CTC is provided, this method includes:
One step 1, training initial GMM model, time point is carried out with the GMM model to the text marking of training data Alignment is forced, obtains the time zone corresponding to each phoneme in the phoneme notation sequence of training data;
Step 2, in the phoneme notation sequence of training data, be all inserted into after each phoneme one it is relevant with the phoneme " blank " symbol, then each phoneme gather around there are one distinctive " blank " symbol, i.e. the quantity of " blank " symbol and training data Phoneme quantity in phoneme notation sequence is identical;
Step 3, using finite state machine (Finite-State Transducers, FST), after adding in " blank " symbol One CTC of phoneme notation sequence construct before backcasting searching route figure;
Step 4, according to the time unifying in step 1 as a result, to each phoneme occur time range limited, it is specified that The time that each phoneme occurs in result is aligned, sets " time tolerance " parameter, that is, sets each phoneme time of occurrence, and Beta pruning is carried out to the searching route figure built in step 3 according to this limitation, phoneme position is subtracted beyond the path of time restriction Fall, obtain final CTC and calculate searching route figure required during network error.
Step 5, using Delayed Neural Networks (Time-delay Neural Network, TDNN) structure combination CTC methods Acoustic training model is carried out, obtains final TDNN-CTC acoustic models, ReLU is used in TDNN as activation primitive.It is described ReLU activation primitive formula are:
G (y)=max (0, y)
Wherein, g (y) represents the value by the neuron node after activation primitive, and y is represented by net before activation primitive The value of the neuron node of the output of network;
In the searching route for carrying out the preceding backcasting, allow each " blank " that arbitrary time can be continuously repeated and go out It is existing, and the modeling phoneme between two adjacent " blank " symbols can not continuously repeat, state transition topological structure is:If work as Preceding state is phoneme, then NextState can jump to " blank " state or next phoneme;If current state is " blank " shape State, then NextState can jump to " blank " itself or next phoneme.
" blank " symbol is shared using all phonemes in multiple independent " blank " symbol substitution original CTC models.
Ranging from 50-300 milliseconds of each phoneme time of occurrence.
The advantage of the invention is that:It is total to using all phonemes in multiple independent " blank " symbol substitution original CTC models " blank " symbol is enjoyed, improves the independence and identification of " blank " symbol, while can be played to network output phoneme Auxiliary judgement acts on, and improves model accuracy;And time point is added to limit CTC searching routes, on the one hand reduce preceding backcasting Searching route quantity, improve training speed, while after the interference path for differing with actual conditions more in removal, model Training more stablize, final accuracy of identification also higher;Using TDNN-CTC structures, modeling ability when having long can be abundant Using the contextual information of input, training week can be greatly shortened while ensureing that accuracy of identification is identical with RNN-CTC models Phase;As common DNN, it is easy to parallel computation, the training speed of CTC models is made to improve about 3 times;And it is proposed by this patent The CTC models trained of method about opposite 10% can be obtained on identification Word Error Rate compared to original CTC baseline models Decline.
Description of the drawings
Fig. 1 is a kind of state transition topology diagram of the method for acoustic training model based on CTC of the present invention
Fig. 2 is a kind of flow chart of the method for acoustic training model based on CTC of the present invention
Specific embodiment
Below in conjunction with attached drawing, the present invention is described in further detail.
As shown in Fig. 2, the present invention provides a kind of method of the acoustic training model based on CTC, this method uses first All phonemes in multiple independent " blank " symbol substitution original CTC models share " blank " symbol, then to training number According to phoneme notation sequence time point alignment is done by an initial model GMM, obtain the Position Approximate that each phoneme occurs, so Afterwards to the searching route figure of backcasting before one CTC of phoneme notation sequence construct after addition " blank " symbol;Then pass through One configurable parameter " time tolerance " control phoneme slightly shifts to an earlier date in searching route in the range of " time tolerance " Or appearance is delayed, " time tolerance " range is the range of each factor time of occurrence, is usually arranged as 50-300 milliseconds. In the present embodiment, described " time tolerance " is set as 50 milliseconds, the time occurred to phoneme in the path of sweep backward before CTC Point is limited, and this method specifically includes:
One step 1, training initial GMM model, time point is carried out with the GMM model to the text marking of training data Alignment is forced, obtains the time zone corresponding to each phoneme in the phoneme notation sequence of training data;
Step 2, in the phoneme notation sequence of training data, be all inserted into after each phoneme one it is relevant with the phoneme " blank " symbol, then each phoneme gather around there are one distinctive " blank " symbol, i.e. the quantity of " blank " symbol and training data Phoneme quantity in phoneme notation sequence is identical;Structure is redirected between phoneme in the every paths of path profile as shown in Figure 1, it will All " blank " symbols in a certain paths remove, and can obtain corresponding aligned phoneme sequence.Therefore, path profile is expressed as owning The isometric set of paths for carrying " blank " symbol and mapping to correct phoneme notation sequence.Wherein, the phoneme per paths It is different with the number that " blank " repeats position occur.
Step 3, using finite state machine (Finite-State Transducers, FST), after adding in " blank " symbol One CTC of phoneme notation sequence construct before backcasting searching route figure;
Step 4, according to the time unifying in step 1 as a result, to each phoneme occur time range limited, it is specified that The time that each phoneme occurs in result is aligned, sets " time tolerance " parameter, will each phoneme time of occurrence set Beta pruning is carried out to the searching route figure built in step 3 at 50 milliseconds, and according to this limitation, phoneme position is limited beyond the time The path of system is cut, and is obtained final CTC and is calculated searching route figure required during network error.
Step 5, using Delayed Neural Networks (Time-delay Neural Network, TDNN) structure combination CTC methods Acoustic training model is carried out, obtains final TDNN-CTC acoustic models, ReLU is used in TDNN as activation primitive.It is described ReLU activation primitive formula are:
G (y)=max (0, y)
Wherein, g (y) represents the value by the neuron node after activation primitive, and y is represented by net before activation primitive The value of the neuron node of the output of network.
In the present embodiment, it is tied using using Delayed Neural Networks (Time-delay Neural Network, TDNN) Structure, used TDNN structures are seven layers, each 576 output nodes of hidden layer, using ReLU activation primitives, every layer of configuration point It is not:{ -1,0,1 } { -1,0,1,2 } { -3,0,3 } { -3,0,3 } { -6,3,0 } { 0 }, wherein { -1,0,1 } represent first layer will be defeated Enter the present frame of layer and its input feature vector of former frame and a later frame be stitched together as input, and so on, every layer of meeting by its The output node at preceding layer several moment is stitched together as input.The output node number of network for 78 (39 English phonemes and Corresponding 39 " blank " symbols).
In the searching route for carrying out the preceding backcasting, allow each " blank " that arbitrary time can be continuously repeated and go out It is existing, and the modeling phoneme between two adjacent " blank " symbols can not continuously repeat, state transition topological structure is:If work as Preceding state is phoneme, then NextState can jump to " blank " state or next phoneme;If current state is " blank " shape State, then NextState can jump to " blank " itself or next phoneme.
" blank " symbol is shared using all phonemes in multiple independent " blank " symbol substitution original CTC models.
It should be noted last that the above embodiments are merely illustrative of the technical solutions of the present invention and it is unrestricted.Although ginseng The present invention is described in detail according to embodiment, it will be understood by those of ordinary skill in the art that, to the technical side of the present invention Case is modified or replaced equivalently, and without departure from the spirit and scope of technical solution of the present invention, should all be covered in the present invention Right in.

Claims (3)

  1. A kind of 1. method of the acoustic training model based on CTC, which is characterized in that this method includes:
    One step 1, training initial GMM model, time point pressure is carried out with the GMM model to the text marking of training data Alignment, obtains the time zone corresponding to each phoneme in the phoneme notation sequence of training data;
    Step 2, in the phoneme notation sequence of training data, be all inserted into after each phoneme one it is relevant " empty with the phoneme Symbol in vain ", then each phoneme gather around there are one distinctive " blank " symbol, the i.e. phoneme of the quantity of " blank " symbol and training data Phoneme quantity in annotated sequence is identical;
    Step 3, using finite state machine, it is front and rear to meter to adding in one CTC of phoneme notation sequence construct after " blank " symbol The searching route figure of calculation;
    Step 4, according to the time unifying in step 1 as a result, being limited the time range that each phoneme occurs, it is specified that each The time that phoneme occurs, in result be aligned setting " time tolerance " parameter set each phoneme time of occurrence, and according to This limitation carries out beta pruning to the searching route figure built in step 3, and phoneme position beyond the path of time restriction is cut, is obtained Required searching route figure when calculating network error to final CTC;
    Step 5, using Delayed Neural Networks structure TDNN, carry out acoustic training model in conjunction with CTC methods, obtain final TDNN-CTC acoustic models, using ReLU as activation primitive in TDNN;The ReLU activation primitives formula is:
    G (y)=max (0, y)
    Wherein, g (y) represents the value by the neuron node after activation primitive, and y is represented by network before activation primitive The value of the neuron node of output.
  2. 2. the method for a kind of acoustic training model based on CTC according to claim 1, which is characterized in that in step 3 In, in the searching route for carrying out the preceding backcasting, each " blank " is allowed to continuously repeat arbitrary time and is occurred, and two phases Modeling phoneme between adjacent " blank " symbol discontinuously repeats.
  3. 3. the method for a kind of acoustic training model based on CTC according to claim 1, which is characterized in that described each Ranging from 50-300 milliseconds of phoneme time of occurrence.
CN201710002096.9A 2017-01-03 2017-01-03 Acoustic model training method based on CTC Active CN108269568B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710002096.9A CN108269568B (en) 2017-01-03 2017-01-03 Acoustic model training method based on CTC

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710002096.9A CN108269568B (en) 2017-01-03 2017-01-03 Acoustic model training method based on CTC

Publications (2)

Publication Number Publication Date
CN108269568A true CN108269568A (en) 2018-07-10
CN108269568B CN108269568B (en) 2021-07-30

Family

ID=62770976

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710002096.9A Active CN108269568B (en) 2017-01-03 2017-01-03 Acoustic model training method based on CTC

Country Status (1)

Country Link
CN (1) CN108269568B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109147774A (en) * 2018-09-19 2019-01-04 华南理工大学 A kind of improved Delayed Neural Networks acoustic model
CN110349570A (en) * 2019-08-16 2019-10-18 问问智能信息科技有限公司 Speech recognition modeling training method, readable storage medium storing program for executing and electronic equipment
CN110349571A (en) * 2019-08-23 2019-10-18 北京声智科技有限公司 A kind of training method and relevant apparatus based on connection timing classification
CN110556093A (en) * 2019-09-17 2019-12-10 浙江核新同花顺网络信息股份有限公司 Voice marking method and system
CN110556100A (en) * 2019-09-10 2019-12-10 苏州思必驰信息科技有限公司 Training method and system of end-to-end speech recognition model
CN110706695A (en) * 2019-10-17 2020-01-17 北京声智科技有限公司 Data labeling method and device
CN111312227A (en) * 2018-12-11 2020-06-19 上海元趣信息技术有限公司 Structure model of speech recognition technology
CN112233655A (en) * 2020-09-28 2021-01-15 上海声瀚信息科技有限公司 Neural network training method for improving voice command word recognition performance
CN113362803A (en) * 2021-05-31 2021-09-07 杭州芯声智能科技有限公司 ARM side off-line voice synthesis method, device and storage medium
WO2021198838A1 (en) * 2020-04-03 2021-10-07 International Business Machines Corporation Training of model for processing sequence data
CN113707137A (en) * 2021-08-30 2021-11-26 普强时代(珠海横琴)信息技术有限公司 Decoding implementation method and device
WO2022134894A1 (en) * 2020-12-23 2022-06-30 腾讯科技(深圳)有限公司 Speech recognition method and apparatus, computer device, and storage medium
CN115910044A (en) * 2023-01-10 2023-04-04 广州小鹏汽车科技有限公司 Voice recognition method and device and vehicle
CN118101632A (en) * 2024-04-22 2024-05-28 安徽声讯信息技术有限公司 Voice low-delay signal transmission method and system based on artificial intelligence
CN118101632B (en) * 2024-04-22 2024-06-21 安徽声讯信息技术有限公司 Voice low-delay signal transmission method and system based on artificial intelligence

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105529027A (en) * 2015-12-14 2016-04-27 百度在线网络技术(北京)有限公司 Voice identification method and apparatus
US20160351188A1 (en) * 2015-05-26 2016-12-01 Google Inc. Learning pronunciations from acoustic sequences
CN106251859A (en) * 2016-07-22 2016-12-21 百度在线网络技术(北京)有限公司 Voice recognition processing method and apparatus
US20160372119A1 (en) * 2015-06-19 2016-12-22 Google Inc. Speech recognition with acoustic models

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160351188A1 (en) * 2015-05-26 2016-12-01 Google Inc. Learning pronunciations from acoustic sequences
US20160372119A1 (en) * 2015-06-19 2016-12-22 Google Inc. Speech recognition with acoustic models
CN105529027A (en) * 2015-12-14 2016-04-27 百度在线网络技术(北京)有限公司 Voice identification method and apparatus
CN106251859A (en) * 2016-07-22 2016-12-21 百度在线网络技术(北京)有限公司 Voice recognition processing method and apparatus

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
ANDREW SENIOR: "Acoustic modelling with CD-CTC-SMBR LSTM RNNS", 《2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU) 》 *
HAŞIM SAK: "Fast and accurate recurrent neural network acoustic models for speech recognition acoustic", 《THE INTERSPEECH 2015 PROCEEDINGS》 *
姚海涛: "面向多语言的语音识别声学模型建模方法研究", 《中国声学学会第十一届青年学术会议会议论文集》 *
李杰: "基于深度学***台》 *
雷鸣: "统计参数语音合成中的声学模型建模方法研究", 《中国博士学位论文全文数据库》 *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109147774A (en) * 2018-09-19 2019-01-04 华南理工大学 A kind of improved Delayed Neural Networks acoustic model
CN109147774B (en) * 2018-09-19 2021-07-20 华南理工大学 Improved time-delay neural network acoustic model
CN111312227A (en) * 2018-12-11 2020-06-19 上海元趣信息技术有限公司 Structure model of speech recognition technology
CN110349570A (en) * 2019-08-16 2019-10-18 问问智能信息科技有限公司 Speech recognition modeling training method, readable storage medium storing program for executing and electronic equipment
CN110349571A (en) * 2019-08-23 2019-10-18 北京声智科技有限公司 A kind of training method and relevant apparatus based on connection timing classification
CN110349571B (en) * 2019-08-23 2021-09-07 北京声智科技有限公司 Training method based on connection time sequence classification and related device
CN110556100B (en) * 2019-09-10 2021-09-17 思必驰科技股份有限公司 Training method and system of end-to-end speech recognition model
CN110556100A (en) * 2019-09-10 2019-12-10 苏州思必驰信息科技有限公司 Training method and system of end-to-end speech recognition model
CN110556093A (en) * 2019-09-17 2019-12-10 浙江核新同花顺网络信息股份有限公司 Voice marking method and system
CN110556093B (en) * 2019-09-17 2021-12-10 浙江同花顺智富软件有限公司 Voice marking method and system
CN110706695B (en) * 2019-10-17 2022-02-18 北京声智科技有限公司 Data labeling method and device
CN110706695A (en) * 2019-10-17 2020-01-17 北京声智科技有限公司 Data labeling method and device
WO2021198838A1 (en) * 2020-04-03 2021-10-07 International Business Machines Corporation Training of model for processing sequence data
GB2609157A (en) * 2020-04-03 2023-01-25 Ibm Training of model for processing sequence data
CN112233655A (en) * 2020-09-28 2021-01-15 上海声瀚信息科技有限公司 Neural network training method for improving voice command word recognition performance
WO2022134894A1 (en) * 2020-12-23 2022-06-30 腾讯科技(深圳)有限公司 Speech recognition method and apparatus, computer device, and storage medium
CN113362803A (en) * 2021-05-31 2021-09-07 杭州芯声智能科技有限公司 ARM side off-line voice synthesis method, device and storage medium
CN113362803B (en) * 2021-05-31 2023-04-25 杭州芯声智能科技有限公司 ARM side offline speech synthesis method, ARM side offline speech synthesis device and storage medium
CN113707137A (en) * 2021-08-30 2021-11-26 普强时代(珠海横琴)信息技术有限公司 Decoding implementation method and device
CN113707137B (en) * 2021-08-30 2024-02-20 普强时代(珠海横琴)信息技术有限公司 Decoding realization method and device
CN115910044A (en) * 2023-01-10 2023-04-04 广州小鹏汽车科技有限公司 Voice recognition method and device and vehicle
CN118101632A (en) * 2024-04-22 2024-05-28 安徽声讯信息技术有限公司 Voice low-delay signal transmission method and system based on artificial intelligence
CN118101632B (en) * 2024-04-22 2024-06-21 安徽声讯信息技术有限公司 Voice low-delay signal transmission method and system based on artificial intelligence

Also Published As

Publication number Publication date
CN108269568B (en) 2021-07-30

Similar Documents

Publication Publication Date Title
CN108269568A (en) A kind of acoustic training model method based on CTC
KR102033411B1 (en) Apparatus and Method for Recognizing speech By Using Attention-based Context-Dependent Acoustic Model
Chen et al. Pipelined Back-Propagation for Context-Dependent Deep Neural Networks.
US6128606A (en) Module for constructing trainable modular network in which each module inputs and outputs data structured as a graph
WO2016101688A1 (en) Continuous voice recognition method based on deep long-and-short-term memory recurrent neural network
JP7070894B2 (en) Time series information learning system, method and neural network model
CN109859743A (en) Audio identification methods, system and machinery equipment
US10825445B2 (en) Method and apparatus for training acoustic model
CN104137178B (en) Acoustic treatment unit interface
CN109346064A (en) Training method and system for end-to-end speech identification model
CN111916058A (en) Voice recognition method and system based on incremental word graph re-scoring
CN108389575A (en) Audio data recognition methods and system
JP2019159058A (en) Speech recognition system, speech recognition method, learned model
Ström Sparse connection and pruning in large dynamic artificial neural networks
JP2018060047A (en) Learning device for acoustic model and computer program therefor
CN110047462A (en) A kind of phoneme synthesizing method, device and electronic equipment
CN108461080A (en) A kind of Acoustic Modeling method and apparatus based on HLSTM models
Gopalakrishnan et al. Sentiment analysis using simplified long short-term memory recurrent neural networks
Khursheed et al. Tiny-crnn: Streaming wakeword detection in a low footprint setting
Pitler et al. A linear-time transition system for crossing interval trees
KR20180068475A (en) Method and device to recognize based on recurrent model and to train recurrent model
Lee High-order hidden Markov model and application to continuous mandarin digit recognition
CN110047463A (en) A kind of phoneme synthesizing method, device and electronic equipment
CN110188355A (en) A kind of segmenting method based on WFST technology, system, equipment and medium
Ruan et al. An improved *** lhasa speech recognition method based on deep neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant