CN111951785B - Voice recognition method and device and terminal equipment - Google Patents

Voice recognition method and device and terminal equipment Download PDF

Info

Publication number
CN111951785B
CN111951785B CN201910407618.2A CN201910407618A CN111951785B CN 111951785 B CN111951785 B CN 111951785B CN 201910407618 A CN201910407618 A CN 201910407618A CN 111951785 B CN111951785 B CN 111951785B
Authority
CN
China
Prior art keywords
conditional probability
loss function
voice recognition
adjusting
influence coefficient
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910407618.2A
Other languages
Chinese (zh)
Other versions
CN111951785A (en
Inventor
陈明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan TCL Group Industrial Research Institute Co Ltd
Original Assignee
Wuhan TCL Group Industrial Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan TCL Group Industrial Research Institute Co Ltd filed Critical Wuhan TCL Group Industrial Research Institute Co Ltd
Priority to CN201910407618.2A priority Critical patent/CN111951785B/en
Publication of CN111951785A publication Critical patent/CN111951785A/en
Application granted granted Critical
Publication of CN111951785B publication Critical patent/CN111951785B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)

Abstract

The invention is applicable to the technical field of voice recognition, and provides a voice recognition method, a voice recognition device and terminal equipment, wherein the method comprises the following steps: calculating a first conditional probability of the sentence according to the pre-trained language model; adjusting a first loss function of the voice recognition model according to the first conditional probability to obtain a second loss function; training the speech recognition model with the second loss function, and performing speech recognition using the trained speech recognition model. The invention can improve the accuracy of voice recognition.

Description

Voice recognition method and device and terminal equipment
Technical Field
The invention belongs to the technical field of voice recognition, and particularly relates to a voice recognition method, a voice recognition device and terminal equipment.
Background
The voice recognition technology aims to recognize the input voice signal and output the text which can be read by a computer, and can be applied to intelligent home, intelligent vehicle-mounted, intelligent customer service robots and the like. With the development of deep learning technology, the speech recognition technology is changed from the traditional machine learning mixed gaussian and hidden markov models (Gaussian Mixture Model-Hidden Markov Model, GMM-HMM) to the deep neural network (Deep Neural Networks, DNN) based technology. The DNN-based speech recognition technology is divided into two types: one is to replace the original GMM part with DNN, namely deep neural network and hidden markov model (Deep Neural Networks-Hidden Markov Model, DNN-HMM), and the other is an end-to-end speech recognition technology based on deep neural network.
Because the End-To-End voice recognition technology (End-To-End Automatic Speech Recognition) based on the deep neural network can directly realize voice input and decoding recognition, complex alignment work and pronunciation dictionary making work are not needed, and a large amount of early preparation time can be saved, so that the method has wider application. At present, the existing end-to-end voice recognition technology (such as continuous time sequence classification CTC, deep full feedforward connected neural network DFSMN, attention mechanism sequence to sequence network Seq2Seq-Attention, etc.) cannot learn a complex language model, and often recognizes input voice through voice waveforms, so that the logic of the recognized words is poor. Therefore, when the trained voice recognition model is adopted for voice recognition, if more complex voice is encountered, the recognition accuracy is lower.
Disclosure of Invention
In view of the above, the embodiments of the present invention provide a method, an apparatus, and a terminal device for voice recognition, so as to solve the problem in the prior art that the recognition accuracy of a trained voice recognition model is low when encountering complex voice.
A first aspect of an embodiment of the present invention provides a speech recognition method, including:
calculating a first conditional probability of the sentence according to the pre-trained language model;
adjusting a first loss function of the voice recognition model according to the first conditional probability to obtain a second loss function;
training the speech recognition model with the second loss function, and performing speech recognition using the trained speech recognition model.
A second aspect of an embodiment of the present invention provides a voice recognition apparatus, including:
a first conditional probability calculation module for calculating a first conditional probability of a sentence according to the pre-trained language model;
the adjusting module is used for adjusting the first loss function of the voice recognition model according to the first conditional probability to obtain a second loss function;
and the voice recognition module is used for training the voice recognition model by utilizing the second loss function and performing voice recognition by using the trained voice recognition model.
A third aspect of an embodiment of the present invention provides a terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method according to the first aspect described above when the computer program is executed.
A fourth aspect of the embodiments of the present invention provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the method as described in the first aspect above.
In the embodiment of the invention, the first conditional probability of sentences is calculated by using the pre-trained language model, the original first loss function of the language identification model is corrected to obtain a second loss function, the language identification model is further trained by using the second loss function, the optimization of the loss function of the language identification model is realized, and the characteristics of the pre-trained language model are introduced; because the first conditional probability of the pre-trained language model is adopted to optimize the first loss function, the pre-trained language model is embedded into the voice recognition model, and the recognition accuracy of the voice recognition model after the training is completed is higher.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a speech recognition method according to an embodiment of the present invention;
FIG. 2 is a flow chart of a specific implementation process for adjusting the first loss function according to the second conditional probability and the influence coefficient according to the embodiment of the present invention;
fig. 3 is a schematic structural diagram of a voice recognition device according to an embodiment of the present invention;
fig. 4 is a schematic diagram of a terminal device according to an embodiment of the present invention.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
In order to illustrate the technical scheme of the invention, the following description is made by specific examples.
Fig. 1 is a schematic flow chart of a voice recognition method according to an embodiment of the present invention, which is described in detail below:
s101: a first conditional probability of the sentence is calculated based on the pre-trained language model.
It should be noted that, the language model can summarize the internal relation between words from a large amount of text information, reduce the error rate of identifying words, and make the identification result more logical, and the common language models include an n-gram language model and a language model based on a neural network.
The pre-trained voice model in the embodiment of the invention can be trained by a language model training tool SRILM and an n-gram voice model, wherein the parameter n represents that the probability of the current word appears is related to the probability of the previous n-1 words. In the embodiment of the invention, a ternary language model, namely a language model with n=3, is trained, and the probability of the current word occurrence is related to the probability of the previous 2 words occurrence. And the sentence refers to a sentence that the speech recognition model predicts to generate from the input samples (speech data).
Further, the calculating the conditional probability of the sentence according to the pre-trained language model includes:
for each sentence, a first conditional probability thereof is calculated according to the following equation:
in the above formula (1), P (S) represents a first conditional probability of the sentence S, C (w) i-(n-1) ,…,w i-1 ,w i ) Representing word w i-(n-1) ,…,w i-1 Word w after occurrence i Number of occurrences, C (w i-(n-1) ,…,w i-1 ) Representing word w i-(n-1) ,…w i-2 Word w after occurrence i-1 The number of occurrences, m, represents the number of samples, n represents a positive integer greater than 1, and i represents the ith word.
Since the n-gram language model refers to that the probability of the current word occurrence is related to the probability of the previous n-1 words occurrence, for a sentence S, the first conditional probability P (S) thereof can be expressed as:
in the above formula (2), P (w) i |w i-(n-1) ,…,w i-1 ) Representing word w i In the word w i-(n-1) ,…,w i-1 The probability of occurrence in the case of occurrence can be calculated by using a maximum likelihood estimation method, and the expression (1) can be obtained.
S102: and adjusting the first loss function of the voice recognition model according to the first conditional probability to obtain a second loss function.
The loss function is the difference between the predicted value and the true value, reflects the deviation degree of the predicted value and the true value, and the lower the deviation degree of the predicted value and the true value is, the more accurate the predicted result is, so that the smaller the loss function is, the better the quality of the finally trained model is, namely the higher the accuracy of voice recognition is.
The first loss function refers to an original loss function of the voice recognition model. The original loss function of the voice recognition model is adjusted by utilizing the first conditional probability, so that the characteristics of the pre-trained language model are introduced, and the accuracy of the trained voice recognition model can be improved.
Specifically, the adjusting the first loss function of the speech recognition model according to the first conditional probability includes:
calculating a second conditional probability using the first conditional probability;
and adjusting the first loss function according to the second conditional probability and the influence coefficient of the pre-trained language model.
After the first conditional probability P (S) is obtained through calculation, the first conditional probability P (S) is transformed to obtain a second conditional probability T, and then the first loss function is adjusted by using the T and the influence coefficient r of the pre-trained language model.
Further, the calculating a second conditional probability using the first conditional probability includes:
using the first conditional probability, and calculating according to the following formula:
in the above formula (3), T represents the calculated second conditional probability, P (S) represents the first conditional probability, and length represents the length of the sentence S, i.e., the number of words contained in S.
As shown in fig. 2, fig. 2 is a schematic flow chart of a specific implementation process of adjusting the first loss function according to the second conditional probability and the influence coefficient of the pre-trained language model, which includes the following steps S201 to S203:
s201: acquiring a plurality of predicted sentences, and calculating a second conditional probability of each sentence;
obtaining multiple predicted sentences from speech recognition models, provided that k predicted sentences are obtained, i.e. y_pred 1 ,y_pred 2 ,…,y_pred k . Calculating the T value of each sentence by using the formula (1) and the formula (3) to obtain T 1 ,T 2 ,…,T k
S202: according to the second conditional probabilities of all sentences and the influence coefficients, calculating average conditional probabilities;
based on the T value obtained in the above step S201 and the influence coefficient r of the pre-trained language model, an average conditional probability T is calculated according to the following formula i
In the above formula (4), T i Represents the calculated average conditional probability, r represents the influence coefficient, k represents the number of sentences, j represents the jth sentence, T j A second conditional probability representing a jth sentence.
S203: and adjusting the first loss function by using the average conditional probability.
The method for adjusting the first loss function by using the average conditional probability comprises the following steps: and adding the average conditional probability on the basis of the original loss function to obtain a second loss function, namely the adjusted loss function.
It should be noted that, since different influence coefficients r affect the recognition accuracy of the speech recognition model after the final training is completed, different r values will be adopted for different sample data.
In a preferred implementation manner of the embodiment of the present invention, the influence coefficient is an optimal influence coefficient, and the method for obtaining the optimal influence coefficient is:
training the voice recognition model by adopting a plurality of influence coefficients, and determining the influence coefficient with the highest recognition accuracy of the voice recognition model according to a training result, namely the optimal influence coefficient;
said adjusting said first loss function according to said second conditional probability and an influence coefficient of said pre-trained language model, comprising:
and adjusting the first loss function according to the second conditional probability and the optimal influence coefficient.
Typically, the influence coefficient r has a selectable value in the range of 0-1. In the embodiment of the invention, after practical training, the following conclusion can be obtained: when the value interval of the influence coefficient r is 0.1 and 0.5, the converged voice recognition model has better recognition accuracy. However, for voice data with different sizes and different domain ranges, different influence coefficients r should be selected, that is, the selection of the influence coefficient r is related to the size and the domain range of the input voice data, and in the actual process, the optimal influence coefficient can be selected according to the needs.
Optionally, the training the speech recognition model using a plurality of influence coefficients includes:
presetting a value interval for the influence coefficient, adjusting the value of the influence coefficient according to a preset step length, and respectively training the voice recognition model by utilizing each influence coefficient.
In the training process of the voice recognition model, a value interval can be preset for r, the value of r is automatically adjusted according to the step length of 0.1 on the assumption that the value interval is [0.1,0.5], the voice recognition model is trained by the value, and r which enables the converged voice recognition model to have the highest recognition precision is determined according to the training result, namely the optimal influence coefficient.
After the optimal influence coefficient is determined, the loss function is adjusted according to the optimal influence coefficient and the first conditional probability, and the speech recognition model is trained according to the adjusted first loss function.
S103: training the speech recognition model with the second loss function, and performing speech recognition using the trained speech recognition model.
It should be noted that, the training process of the speech recognition model is as follows: inputting sample data with labels to a voice recognition model, wherein the sample data are voice data and texts corresponding to the voice data; and extracting the characteristics of the sample data to obtain a characteristic sequence, encoding the characteristic sequence, decoding to obtain a predicted value, making a difference value between the predicted value and a true value to obtain a loss function, and training the model according to the loss function until the model converges to obtain the trained voice recognition model.
The second loss function refers to the difference between the true value and the predicted value, the value of the second loss function is obtained, the value of the second loss function is utilized to carry out parameter adjustment on the voice recognition model, and finally the voice recognition model with optimal parameters is obtained, namely the trained voice recognition model.
When the trained voice recognition model is used for voice recognition, the audio data to be recognized is input into the trained voice recognition model, and the trained voice recognition model outputs the text corresponding to the audio to be recognized, so that voice recognition can be realized.
In the embodiment of the invention, the first conditional probability of sentences is calculated by using the pre-trained language model, the original first loss function of the language identification model is corrected to obtain a second loss function, the language identification model is further trained by using the second loss function, the optimization of the loss function of the language identification model is realized, and the characteristics of the pre-trained language model are introduced; because the first conditional probability of the pre-trained language model is adopted to optimize the first loss function, the pre-trained language model is embedded into the voice recognition model, and the recognition accuracy of the voice recognition model after the training is completed is higher.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.
Fig. 3 is a schematic structural diagram of a voice recognition device according to an embodiment of the present invention, where the device includes: a first conditional probability calculation module 31, an adjustment module 32 and a speech recognition module 33. Wherein:
a first conditional probability calculation module 31 for calculating a first conditional probability of a sentence according to a pre-trained language model.
Further, the first conditional probability calculating module 31 is specifically configured to: for each sentence, a first conditional probability thereof is calculated according to the following equation:
in the above formula, P (S) represents a first conditional probability of sentence S, C (w) i-(n-1) ,…,w i-1 ,w i ) Representing word w i-(n-1) ,…,w i-1 Word w after occurrence i Number of occurrences, C (w i-(n-1) ,…,w i-1 ) Representing word w i-(n-1) ,…w i-2 Word w after occurrence i-1 The number of occurrences, m, represents the number of samples, n represents a positive integer greater than 1, and i represents the ith word.
And the adjusting module 32 is configured to adjust the first loss function of the speech recognition model according to the first conditional probability, so as to obtain a second loss function.
Further, the adjustment module 32 includes: a second conditional probability calculation unit 321, an adjustment unit 322, wherein:
the second conditional probability calculating unit 321 is configured to calculate a second conditional probability using the first conditional probability.
Further, the second conditional probability calculating unit 321 is specifically configured to:
using the first conditional probability, and calculating according to the following formula:
in the above expression (3), T represents the calculated second conditional probability, P (S) represents the first conditional probability, and length represents the length of the sentence S.
The adjusting unit 322 is configured to adjust the first loss function according to the second conditional probability and an influence coefficient of the pre-trained language model.
Still further, the adjusting unit 322 includes:
a first calculation subunit 3221 configured to obtain a plurality of predicted sentences, and calculate a second conditional probability of each sentence;
a second calculation subunit 3222, configured to calculate an average conditional probability according to the second conditional probabilities of all sentences and the influence coefficients;
an adjustment subunit 3223 is configured to adjust the first loss function by using the average conditional probability.
A speech recognition module 33 for training the speech recognition model with the second loss function and performing speech recognition using the trained speech recognition model.
Preferably, the influence coefficient is an optimal influence coefficient, and the device further includes an optimal influence coefficient obtaining module 34, configured to train the speech recognition model with a plurality of influence coefficients, and determine, according to a training result, an influence coefficient that makes the recognition accuracy of the speech recognition model highest, that is, the optimal influence coefficient;
preferably, the adjusting unit 322 is configured to adjust the first loss function according to the second conditional probability and the optimal influence coefficient.
Further, the optimal influence coefficient obtaining module 34 is specifically configured to: presetting a value interval for the influence coefficient, adjusting the value of the influence coefficient according to a preset step length, and respectively training the voice recognition model by utilizing each influence coefficient.
Fig. 4 is a schematic diagram of a terminal device according to an embodiment of the present invention. As shown in fig. 4, the terminal device 4 of this embodiment includes: a processor 40, a memory 41 and a computer program 42, such as a speech recognition program, stored in the memory 41 and executable on the processor 40. The processor 40, when executing the computer program 42, implements the steps of the various speech recognition method embodiments described above, such as steps S101 to S103 shown in fig. 1. Alternatively, the processor 40, when executing the computer program 42, performs the functions of the modules/units of the apparatus embodiments described above, such as the functions of the modules 31 to 33 shown in fig. 3.
Illustratively, the computer program 42 may be partitioned into one or more modules/units that are stored in the memory 41 and executed by the processor 40 to complete the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions for describing the execution of the computer program 42 in the terminal device 4. For example, the computer program 42 may be divided into a first conditional probability calculation module, an adjustment module, and a speech recognition module, each of which specifically functions as follows:
a first conditional probability calculation module for calculating a first conditional probability of a sentence according to the pre-trained language model;
the adjusting module is used for adjusting the first loss function of the voice recognition model according to the first conditional probability to obtain a second loss function;
and the voice recognition module is used for training the voice recognition model by utilizing the second loss function and performing voice recognition by using the trained voice recognition model.
The terminal device 4 may be a computing device such as a desktop computer, a notebook computer, a palm computer, a cloud server, etc. The terminal device may include, but is not limited to, a processor 40, a memory 41. It will be appreciated by those skilled in the art that fig. 4 is merely an example of the terminal device 4 and does not constitute a limitation of the terminal device 4, and may include more or less components than illustrated, or may combine certain components, or different components, e.g., the terminal device may further include an input-output device, a network access device, a bus, etc.
The processor 40 may be a central processing unit (Central Processing Unit, CPU), other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 41 may be an internal storage unit of the terminal device 4, such as a hard disk or a memory of the terminal device 4. The memory 41 may be an external storage device of the terminal device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the terminal device 4. Further, the memory 41 may also include both an internal storage unit and an external storage device of the terminal device 4. The memory 41 is used for storing the computer program as well as other programs and data required by the terminal device. The memory 41 may also be used for temporarily storing data that has been output or is to be output.
The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims (8)

1. A method of speech recognition, comprising:
calculating a first conditional probability of the sentence according to the pre-trained language model;
adjusting a first loss function of the voice recognition model according to the first conditional probability to obtain a second loss function;
training the voice recognition model by utilizing the second loss function, and performing voice recognition by using the trained voice recognition model;
the adjusting the first loss function of the speech recognition model according to the first conditional probability includes:
calculating a second conditional probability using the first conditional probability;
adjusting the first loss function according to the second conditional probability and the influence coefficient of the pre-trained language model;
the calculating a second conditional probability using the first conditional probability includes:
using the first conditional probability, and calculating according to the following formula:
in the above formula, T represents the calculated second conditional probability, P (S) represents the first conditional probability, and length represents the length of the sentence S.
2. The method of claim 1, wherein calculating a first conditional probability of a sentence from a pre-trained language model comprises:
for each sentence, a first conditional probability thereof is calculated according to the following equation:
in the above formula, P (S) represents a first conditional probability of sentence S, C (w) i-(n-1) ,…,w i-1 ,w i ) Representing word w i-(n-1) ,…,w i-1 Word w after occurrence i Number of occurrences, C (w i-(n-1) ,…,w i-1 ) Representing word w i-(n-1) ,…w i-2 Word w after occurrence i-1 The number of occurrences, m, represents the number of samples, n represents a positive integer greater than 1, and i represents the ith word.
3. The method of claim 1, wherein said adjusting the first loss function based on the second conditional probability and the coefficient of influence of the pre-trained language model comprises:
acquiring a plurality of predicted sentences, and calculating a second conditional probability of each sentence;
according to the second conditional probabilities of all sentences and the influence coefficients, calculating average conditional probabilities;
and adjusting the first loss function by using the average conditional probability.
4. The method of claim 3, wherein the influence coefficient is an optimal influence coefficient, and the method for obtaining the optimal influence coefficient is as follows:
training the voice recognition model by adopting a plurality of influence coefficients, and determining the influence coefficient with the highest recognition accuracy of the voice recognition model according to a training result, namely the optimal influence coefficient;
said adjusting said first loss function according to said second conditional probability and an influence coefficient of said pre-trained language model, comprising:
and adjusting the first loss function according to the second conditional probability and the optimal influence coefficient.
5. The method of claim 4, wherein training the speech recognition model using a plurality of influence coefficients, respectively, comprises:
presetting a value interval for the influence coefficient, adjusting the value of the influence coefficient according to a preset step length, and respectively training the voice recognition model by utilizing each influence coefficient.
6. A speech recognition apparatus, comprising:
a first conditional probability calculation module for calculating a first conditional probability of a sentence according to the pre-trained language model;
the adjusting module is used for adjusting the first loss function of the voice recognition model according to the first conditional probability to obtain a second loss function;
the voice recognition module is used for training the voice recognition model by utilizing the second loss function and performing voice recognition by using the trained voice recognition model;
the adjustment module comprises:
a second conditional probability calculation unit configured to calculate a second conditional probability using the first conditional probability;
an adjusting unit, configured to adjust the first loss function according to the second conditional probability and an influence coefficient of the pre-trained language model;
the second conditional probability calculation unit is specifically configured to:
using the first conditional probability, and calculating according to the following formula:
in the above formula, T represents the calculated second conditional probability, P (S) represents the first conditional probability, and length represents the length of the sentence S.
7. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 5 when the computer program is executed.
8. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 5.
CN201910407618.2A 2019-05-16 2019-05-16 Voice recognition method and device and terminal equipment Active CN111951785B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910407618.2A CN111951785B (en) 2019-05-16 2019-05-16 Voice recognition method and device and terminal equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910407618.2A CN111951785B (en) 2019-05-16 2019-05-16 Voice recognition method and device and terminal equipment

Publications (2)

Publication Number Publication Date
CN111951785A CN111951785A (en) 2020-11-17
CN111951785B true CN111951785B (en) 2024-03-15

Family

ID=73336907

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910407618.2A Active CN111951785B (en) 2019-05-16 2019-05-16 Voice recognition method and device and terminal equipment

Country Status (1)

Country Link
CN (1) CN111951785B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113223504B (en) * 2021-04-30 2023-12-26 平安科技(深圳)有限公司 Training method, device, equipment and storage medium of acoustic model
CN113327581B (en) * 2021-05-04 2022-05-24 西安博达软件股份有限公司 Recognition model optimization method and system for improving speech recognition accuracy

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5839106A (en) * 1996-12-17 1998-11-17 Apple Computer, Inc. Large-vocabulary speech recognition using an integrated syntactic and semantic statistical language model
KR20050011441A (en) * 2003-07-23 2005-01-29 주식회사 팬택 Method for modificating hmm
JP2010078877A (en) * 2008-09-25 2010-04-08 Pioneer Electronic Corp Speech recognition device, speech recognition method, and speech recognition program
CN102999533A (en) * 2011-09-19 2013-03-27 腾讯科技(深圳)有限公司 Textspeak identification method and system
CN107154260A (en) * 2017-04-11 2017-09-12 北京智能管家科技有限公司 A kind of domain-adaptive audio recognition method and device
CN107480144A (en) * 2017-08-03 2017-12-15 中国人民大学 Possess the image natural language description generation method and device across language learning ability
CN108962223A (en) * 2018-06-25 2018-12-07 厦门快商通信息技术有限公司 A kind of voice gender identification method, equipment and medium based on deep learning
CN109272990A (en) * 2018-09-25 2019-01-25 江南大学 Audio recognition method based on convolutional neural networks
CN109410914A (en) * 2018-08-28 2019-03-01 江西师范大学 A kind of Jiangxi dialect phonetic and dialect point recognition methods

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10186255B2 (en) * 2016-01-16 2019-01-22 Genesys Telecommunications Laboratories, Inc. Language model customization in speech recognition for speech analytics
US10176799B2 (en) * 2016-02-02 2019-01-08 Mitsubishi Electric Research Laboratories, Inc. Method and system for training language models to reduce recognition errors

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5839106A (en) * 1996-12-17 1998-11-17 Apple Computer, Inc. Large-vocabulary speech recognition using an integrated syntactic and semantic statistical language model
KR20050011441A (en) * 2003-07-23 2005-01-29 주식회사 팬택 Method for modificating hmm
JP2010078877A (en) * 2008-09-25 2010-04-08 Pioneer Electronic Corp Speech recognition device, speech recognition method, and speech recognition program
CN102999533A (en) * 2011-09-19 2013-03-27 腾讯科技(深圳)有限公司 Textspeak identification method and system
CN107154260A (en) * 2017-04-11 2017-09-12 北京智能管家科技有限公司 A kind of domain-adaptive audio recognition method and device
CN107480144A (en) * 2017-08-03 2017-12-15 中国人民大学 Possess the image natural language description generation method and device across language learning ability
CN108962223A (en) * 2018-06-25 2018-12-07 厦门快商通信息技术有限公司 A kind of voice gender identification method, equipment and medium based on deep learning
CN109410914A (en) * 2018-08-28 2019-03-01 江西师范大学 A kind of Jiangxi dialect phonetic and dialect point recognition methods
CN109272990A (en) * 2018-09-25 2019-01-25 江南大学 Audio recognition method based on convolutional neural networks

Also Published As

Publication number Publication date
CN111951785A (en) 2020-11-17

Similar Documents

Publication Publication Date Title
JP5901001B1 (en) Method and device for acoustic language model training
US20180158449A1 (en) Method and device for waking up via speech based on artificial intelligence
CN110717039A (en) Text classification method and device, electronic equipment and computer-readable storage medium
CN112115267B (en) Training method, device, equipment and storage medium of text classification model
CN112528637B (en) Text processing model training method, device, computer equipment and storage medium
WO2022121185A1 (en) Model training method and apparatus, dialect recognition method and apparatus, and server and storage medium
WO2022121178A1 (en) Training method and apparatus and recognition method and apparatus for text error correction model, and computer device
US20200160850A1 (en) Speech recognition system, speech recognition method and computer program product
CN113326702B (en) Semantic recognition method, semantic recognition device, electronic equipment and storage medium
CN112016271A (en) Language style conversion model training method, text processing method and device
CN116629235B (en) Large-scale pre-training language model fine tuning method and device, electronic equipment and medium
CN111951785B (en) Voice recognition method and device and terminal equipment
CN114067786A (en) Voice recognition method and device, electronic equipment and storage medium
WO2022257454A1 (en) Speech synthesis method, apparatus and terminal, and storage medium
CN112530402B (en) Speech synthesis method, speech synthesis device and intelligent equipment
CN112487813B (en) Named entity recognition method and system, electronic equipment and storage medium
CN116361316A (en) Semantic engine adaptation method, device, equipment and storage medium
CN111401069A (en) Intention recognition method and intention recognition device for conversation text and terminal
CN114625860A (en) Contract clause identification method, device, equipment and medium
CN114117051A (en) Training method of part-of-speech tagging model, part-of-speech tagging method and electronic equipment
CN114330375A (en) Term translation method and system based on fixed paradigm
CN113889115A (en) Dialect commentary method based on voice model and related device
CN112509565A (en) Voice recognition method and device, electronic equipment and readable storage medium
CN112434133A (en) Intention classification method and device, intelligent terminal and storage medium
CN112633019B (en) Bilingual sample generation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant