CN106782560A - Determine the method and device of target identification text - Google Patents
Determine the method and device of target identification text Download PDFInfo
- Publication number
- CN106782560A CN106782560A CN201710127503.9A CN201710127503A CN106782560A CN 106782560 A CN106782560 A CN 106782560A CN 201710127503 A CN201710127503 A CN 201710127503A CN 106782560 A CN106782560 A CN 106782560A
- Authority
- CN
- China
- Prior art keywords
- text
- identification
- determined
- identification text
- candidate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 31
- 239000000203 mixture Substances 0.000 claims abstract description 8
- 239000013598 vector Substances 0.000 claims description 35
- 239000003550 marker Substances 0.000 claims description 4
- 230000006870 function Effects 0.000 description 5
- 238000012549 training Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 239000000463 material Substances 0.000 description 4
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 235000013399 edible fruits Nutrition 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 235000021152 breakfast Nutrition 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 235000013305 food Nutrition 0.000 description 2
- 239000011521 glass Substances 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000000151 deposition Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000007407 health benefit Effects 0.000 description 1
- 238000012905 input function Methods 0.000 description 1
- 239000008267 milk Substances 0.000 description 1
- 210000004080 milk Anatomy 0.000 description 1
- 235000013336 milk Nutrition 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000007935 neutral effect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application provides a kind of method and device for determining target identification text, and the method includes:Determine determination identification text and the identification text to be determined in the corresponding at least two candidates identification text of speech data to be identified, wherein, it is determined that identification text is identical part at least two candidates identification text, identification text to be determined is that at least two candidates recognize the part differed in text;Similarity between the text of the correspondence position for calculating identification text to be determined and targeted contrast text, targeted contrast text is text consistent with the sentence pattern structure of candidate's identification text in pre-set text storehouse, and targeted contrast text is including determining identification text;And then the corresponding identification text to be determined of the maximum in similarity is recognized into text with the candidate for determining identification text composition, it is configured to target identification text;Realize and further screen target identification text from candidate's identification text, improve the accuracy of target identification text.
Description
Technical field
The application is related to speech recognition technology, more particularly to a kind of method and device for determining target identification text.
Background technology
With the development of voice control technology, increasing smart machine possesses speech identifying function at present, example,
Possess intelligent television, intelligent refrigerator, intelligent air condition of voice control function etc. and possess the smart mobile phone intelligence of speech voice input function
Energy computer etc..
Current speech recognition is mainly comprising voice pretreatment, acoustic model decoding, pronunciation dictionary parsing, language model solution
The processes such as code, wherein, voice pretreatment is that the voice signal that will be received simply is processed, and obtains the tag file of voice
Deng;The input of acoustic model decoding is the tag file of voice, and acquisition probability highest phoneme file is decoded by acoustic model;
And then, by inquiring about pronunciation dictionary, phoneme information is switched into possible spelling words intellectual, then the context pass for passing through language model
Connection information, acquisition probability spelling words intellectual information higher is used as candidate's recognition result from spelling words intellectual.Due in language model
Language material source it is relatively broad, candidate's recognition result cannot ensure the accuracy of recognition result, it is therefore desirable to by certain methods from
In screen out accurate recognition result.
But, it is unsuitable in the prior art to select method.
Application content
The application provides a kind of method and device for determining target identification text, for the candidate in speech data to be identified
Accurate recognition result is selected out in recognition result.
A kind of the application first aspect determines target identification text method in providing identification text from least two candidates,
Including:
Determine determination identification text in speech data to be identified corresponding at least two candidate identification text and to be determined
Identification text, wherein, it is described to determine that identification text is identical part in candidate's identification text described at least two, it is described to treat really
Surely identification text is that candidate described at least two recognizes the part differed in text;
Similarity between the text of the correspondence position for calculating the identification text to be determined and targeted contrast text, its
In, the targeted contrast text is the consistent text of sentence pattern structure with candidate identification text in pre-set text storehouse, and institute
Stating targeted contrast text includes the determination identification text;
The corresponding identification text to be determined of maximum in the similarity is determined that identification text is constituted with described
The candidate identification text, be configured to target identification text.
The application second aspect determines the device of target identification text in providing a kind of identification text from candidate, including:
First determining module, for determining the determination in the corresponding at least two candidates identification text of speech data to be identified
Identification text and identification text to be determined, wherein, it is described to determine that identification text is phase in candidate's identification text described at least two
Same part, the identification text to be determined is that candidate described at least two recognizes the part differed in text;
Computing module, between the text of the correspondence position for calculating the identification text to be determined and targeted contrast text
Similarity, wherein, the targeted contrast text be pre-set text storehouse in the candidate identification text sentence pattern structure it is consistent
Text, and the targeted contrast text include it is described determine identification text;
Second determining module, for by the maximum in the similarity it is corresponding it is described it is to be determined identification text with it is described
It is determined that the candidate identification text of identification text composition, is configured to target identification text.
The application's has the beneficial effect that:
The application is provided in the method for the identification text that sets the goal really, it is first determined speech data to be identified is corresponding at least
Determination identification text and identification text to be determined in two candidate's identification texts, then for identification text to be determined, calculate
Similarity between the text of the correspondence position of identification text to be determined and targeted contrast text, by the maximum pair in similarity
The identification text to be determined answered is defined as the corresponding correct result of speech data to be identified, so by the identification text to be determined with
It is determined that identification text composition candidate identification text, be configured to target identification text, realize get multiple probability approach
Candidate identification text when, according to the targeted contrast text consistent with its sentence pattern structure, further according to identification text to be determined
With the similarity between the text of correspondence position in targeted contrast text, determine immediate with the speech data of user input
Identification text to be determined, and then by the identification text to be determined and determine that identification text constitutes target identification text together, feed back
To user, i.e., by referring to targeted contrast text, the different piece in candidate's identification text close to multiple probability is further
Selection, improves the accuracy for recognizing speech data to be identified, improves the user experience of speech recognition.
Brief description of the drawings
Technical scheme in order to illustrate more clearly the embodiments of the present invention, below will be to that will make needed for embodiment description
Accompanying drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the present invention, for
For those of ordinary skill in the art, on the premise of not paying creative work, other can also be obtained according to these accompanying drawings
Accompanying drawing.
The side that target identification text is determined from least two candidates identification text that Fig. 1 is provided for the embodiment of the application one
Method schematic flow sheet;
Fig. 2 determines target identification text for what another embodiment of the application was provided from least two candidates identification text
Method flow schematic diagram;
The dress that target identification text is determined from least two candidates identification text that Fig. 3 is provided for the embodiment of the application one
Put structural representation;
Fig. 4 determines target identification text for what another embodiment of the application was provided from least two candidates identification text
Apparatus structure schematic diagram.
Specific embodiment
In order that the object, technical solutions and advantages of the present invention are clearer, below in conjunction with accompanying drawing the present invention is made into
One step ground is described in detail, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole implementation
Example.Based on the embodiment in the present invention, what those of ordinary skill in the art were obtained under the premise of creative work is not made
All other embodiment, belongs to the scope of protection of the invention.
Before carrying out that explanation is explained in detail to the embodiment of the present invention, first the applied environment to the embodiment of the present invention gives
Introduce.It is provided in an embodiment of the present invention for showing that the display methods of phonetic entry control instruction is applied to terminal, example, should
Terminal can be the intelligent television with Android operation system or IOS, smart mobile phone, panel computer etc., the end
End can also be the computer with Window operating systems or Ios operating systems, PDA (Personal Digital
Assistant, personal digital assistant) etc., the embodiment of the present invention is not specifically limited to this.
Provided herein one method that target identification text is determined from least two candidates identification text, knows in voice
The final speech recognition text of selection Huo Qu not be further analyzed in multiple recognition results on the basis of multiple recognition results,
To improve the accuracy of speech recognition.
The side that target identification text is determined from least two candidates identification text that Fig. 1 is provided for the embodiment of the application one
Method schematic flow sheet, as shown in figure 1, the method includes:
S101, determine determination identification text in speech data to be identified corresponding at least two candidate identification text and treat
It is determined that identification text.
During implementing, after user input speech data to be identified, due to pronunciation close to or accuracy of identification etc.
Reason, may recognize multiple speech recognition texts.
Such as user has said one " I wants to listen the song of Gao Shengmei ", is likely to be obtained " I wants the song for listening Gaosheng beautiful ", " I
Want to listen the song of glad U.S. ", multiple speech recognition texts such as " I wants to listen the song of Gao Shengmei ".
First determine that candidate recognizes text from this multiple speech recognition text, further select accurate recognition result.
Candidate recognizes that text is constituted by determining identification text and identification text to be determined.Wherein it is determined that identification text be to
Identical part in few two candidates identification text, identification text to be determined is to be differed during at least two candidates recognize text
Part.For example in " I wants to listen the song of glad U.S. " and " I wants to listen the song of Gao Shengmei ", " I wants to listen ", " song " are to determine
Identification text, " Gao Shengmei " and " glad beautiful " is identification text to be determined.
It is to need without identical part i.e. it is considered that identical part is accurate result in multiple candidate's identification texts
The identification text to be determined to be further determined that, namely identification text to be determined also needs to further be identified, to obtain
More accurately result.
Similarity between S102, the text of the correspondence position for calculating identification text to be determined and targeted contrast text.
Wherein, targeted contrast text is text consistent with the sentence pattern structure of candidate's identification text in pre-set text storehouse, and
Targeted contrast text includes that above-mentioned determination recognizes text.
In pre-set text storehouse can including largely prestore sentence, etc. word combination, can by the meaning of a word, part of speech (noun,
Verb) etc., the matching targeted contrast text consistent with candidate's identification text sentence pattern in pre-set text storehouse.For example " I wants to listen height
The song of Xing Mei " may match targeted contrast text " I wants to listen the song of Zhou Jielun " etc..For example " me please be give one glass of coffee again
Coffee " may match targeted contrast text " please give me one glass of milk ".
For example, targeted contrast text includes that above-mentioned determination identification text, i.e. " I wants to listen the song of Zhou Jielun " are included
It is determined that identification text " I wants to listen ", " song ".
S103, by the maximum in similarity it is corresponding it is to be determined identification text with determine identification text composition candidate know
Other text, is configured to target identification text.
Alternatively, between the text of the correspondence position for calculating determination identification text to be determined and targeted contrast text respectively
Similarity.The similarity of " Gaosheng is beautiful " between " Zhou Jielun ", the phase between " Gao Shengmei " and " Zhou Jielun " are for example determined respectively
Like degree etc..
If " Gao Shengmei " is maximum with the similarity of " Zhou Jielun ", then " I wants to listen the song of Gao Shengmei " is configured into target
Identification text.
Wherein, above-mentioned similarity can refer to semantic similarity, or affiliated type similarity, part of speech similarity etc.,
This is not restricted.
In the present embodiment, it is first determined the determination in the corresponding at least two candidates identification text of speech data to be identified is known
Other text and identification text to be determined, then for identification text to be determined, calculate identification text to be determined with targeted contrast text
Similarity between the text of this correspondence position, the corresponding identification text to be determined of the maximum in similarity is defined as treating
The corresponding correct result of identification speech data, and then the identification text to be determined and determination are recognized candidate's identification of text composition
Text, is configured to target identification text, realizes when getting the close candidate of multiple probability and recognizing text, according to its sentence
The consistent targeted contrast text of type structure, further according to the text of correspondence position in identification text to be determined and targeted contrast text
Similarity between this, determines identification text to be determined immediate with the speech data of user input, and then this is treated really
Surely identification text and determination identification text constitute target identification text together, feed back to user, i.e., by referring to targeted contrast text
This, the different piece in candidate's identification text close to multiple probability is further selected, and improves identification voice number to be identified
According to accuracy, improve the user experience of speech recognition.
The method flow that target identification text is determined from candidate's identification text that Fig. 2 is provided for another embodiment of the application
Schematic diagram.As shown in Fig. 2 on the basis of Fig. 1, also including before S101:
S201, the corresponding multiple speech recognition texts of acquisition speech data to be identified.
After one section of voice of user input, terminal can obtain multiple results, typically according to default speech recognition decoder
Ground, default speech recognition decoder can include that one or more are used for the model of speech recognition, enter to speech data to be identified
Row identification.Because some pronunciations are fuzzy or unisonance itself, the close vocabulary that pronounces are more in voice messaging, multiple may be recognized
Speech recognition text.
Specifically:After getting speech data to be identified, can first by speech data to be identified carry out front end signal treatment,
End-point detection treatment etc. some pretreatment after, phonetic feature is extracted frame by frame, the feature that will have been extracted delivers to default speech recognition solution
Code device, default speech recognition decoder can include:The related decoding mould such as acoustic model, language model and pronunciation dictionary
Type, in a decoder with reference to acoustic model, language model and pronunciation dictionary, obtains multiple speech recognition texts.
Wherein, acoustic model mainly describes the likelihood probability of feature under pronunciation model, and acoustic model can use hidden Ma Er
Section husband model (HMM).Continuous probability of occurrence between the main descriptor of language model, language model use can with n-gram models,
For Chinese, we term it Chinese language model (CLM, Chinese Language Model), wherein can be comprising big
The language material of amount, these language materials can be substantial amounts of sentence, vocabulary etc., can be according to the statistical probability of co-occurrence between front and rear word come about
The result of beam text search.Pronunciation dictionary is mainly the conversion completed between word and sound.During specific conversion, acoustic model decoding is
The tag file of voice signal is searched in acoustic model, optimal phoneme recognition result is produced, wherein phoneme can be with identifier word
It is female.By inquiring about pronunciation dictionary, phoneme recognition result is changed into word.Finally, the target of language model decoding is from inquiry
Most possible spelling words intellectual result is chosen in the spelling words intellectual that pronunciation dictionary is obtained, as speech recognition text.
It should be noted that can join to the operation that speech data to be identified identification obtains its corresponding speech recognition text
Correlation technique is examined, the embodiment of the present invention is no longer repeated this one by one.
Example, can successively realize recognizing speech data to be identified that obtaining its corresponding voice knows by following formula
The operation of other text.
W1=argmaxP (W | X) (1)
Wherein, in above-mentioned formula (1), W represents any word sequence stored in database, and the word sequence includes word
Or word, the database can be the corpus for doing speech recognition;X represents the speech data of user input, W1Represent from depositing
The word sequence that can be matched with speech data to be identified obtained in storage word sequence, and P (W | X) represent the speech data to be identified
The probability of word can be become.In above-mentioned formula (2), W2Represent between the speech data to be identified and the word sequence
With degree, and P (X | W) probability that the word sequence can pronounce is represented, P (W) represents that the word sequence is word or the probability of word, P
(X) represent that speech data to be identified is the probability of audio-frequency information.
It should be noted that in above-mentioned identification process, P (W) can be determined by language model, by acoustic model
Determine P (X | W), so as to complete the speech recognition to the speech data to be identified, obtain the corresponding voice of speech data to be identified
Identification text.It is following that language model and acoustic model simply will be introduced respectively.
Language model
Language model generally utilizes chain rule, word sequence for the probability of word or word disassembles into wherein each word or word
Probability product, that is to say, W is disassembled into w1、w2、w3、....wn-1、wn, and determine P (W) by following formula (3).
P (W)=P (w1)P(w2|w1)P(w3|w1,w2)...P(wn|w1,w2,...,wn-1) (3)
Wherein, in above-mentioned formula (3), each single item in P (W) is all that all word sequences are all before known to representing
Current character sequence is the probability of word or word under conditions of word or word.
Due to when P (W) is determined by above-mentioned formula (3), if condition is oversize, it is determined that the efficiency of P (W) will be compared with
It is low, so as to influence follow-up speech recognition.Therefore, the efficiency of P (W) is determined to improve, it will usually by language model
N-gram language models determine P (W).When P (W) is determined by n-gram language models, the probability of n-th word only depends on position
(n-1)th word before the word, now can determine P (W) by following formula (4).
P (W)=P (w1)P(w2|w1)P(w3|w2)...P(wn|wn-1) (4)
Acoustic model
Due to it is determined that also need to determine the pronunciation of each word during each word, and determining the pronunciation of each word then needs to pass through
Dictionary is realized.Wherein, dictionary is the model arranged side by side with acoustic model and language module, and the dictionary can be converted into single word
Phone string.Acoustic model can determine word in the speech data of user input by dictionary, and which sound this sends out successively, and leads to
The DP algorithm for crossing such as Viterbi (Viterbi) algorithm finds the separation of each phoneme, so that it is determined that each phoneme
Beginning and ending time, and then determine the matching degree of speech data and the phone string of user input, that is to say, determine P (X | W).
Under normal circumstances, the characteristic vector of each phoneme can be estimated by the grader of such as gauss hybrid models
Distribution, and in speech recognition period, determine the characteristic vector x of each frame in the speech data of user inputtBy corresponding phoneme siProduce
Raw probability P (xt|si), the probability multiplication of each frame, just obtain P (X | W).
Wherein, grader can be obtained with precondition, and concrete operations are:By frequency cepstral coefficient (Mel Frequency
Cepstrum Coefficient, MFCC) substantial amounts of characteristic vector, and each characteristic vector correspondence are extracted from training data
Phoneme, so as to train the grader from feature to phoneme.
It should be noted that in actual applications, P (X | W) not only can be through the above way determined, can also include it
His mode, such as, P (s are directly given by neutral neti|xt), can be converted into P (x with Bayesian formulat|si), then be multiplied
P (X | W) is obtained, certainly, is merely illustrative of herein, do not represented the embodiment of the present invention and be confined to this.
Most probable value and the second greatest in S202, the corresponding probable value of the multiple speech recognition texts of determination.
Each speech recognition text can be calculated using preset algorithm according to the spelling words intellectual of each speech recognition text
Identification probability.
It is alternatively possible to using formulaCalculate each speech recognition text
This probable value Prec, whereinIt is the decoding rate of acoustic model,It is the decoding rate of pronunciation dictionary,It is language
Speech solution to model code check.The tag file of speech data to be identified is represented,It is the spelling words intellectual for identifying,It is phoneme sequence
Row.
It can be seen that, substitute into spelling words intellectual, the aligned phoneme sequence of each speech recognition text, and speech data to be identified spy
Solicit articles part, each speech recognition text can be obtained corresponding And then obtain each language
The corresponding probable value of sound identification text.
Assuming that a total of N number of speech recognition text, the probable value of each speech recognition text is designated as Pn, wherein, n=1,
2 ... ..., N.Most probable value P can also further be selectedmaxWith the second greatest P2max。
S203, determine difference between most probable value and the second greatest whether more than default probability threshold value.
It is possible to further obtain the difference between most probable value and the second greatest, if difference is more than or equal to
Default probability threshold value, illustrates that the corresponding speech recognition text accuracy rate of most probable value is inherently higher, can directly determine
The corresponding speech recognition text of most probable value is target identification text.
When implementing, the probable value P of maximum can be successively calculatedmaxWith other probable values PnDifference, alternatively, adopt
Use formulaAbsolute value average is calculated as acoustics probability value difference EP, EPReflection speech recognition text
Distribution situation, has weighed best speech recognition text and remaining direct gap of speech recognition text.EPDuring more than predetermined threshold value,
Can directly by maximum probable value PmaxCorresponding speech recognition text is defined as target identification text, and without further entering
Row semantic analysis.
Further, when the difference between most probable value and the second greatest is less than default probability threshold value, from many
Determine that at least two candidates recognize text in individual speech recognition text.
Alternatively, determine that at least two candidates recognize text, Ke Yishi from multiple speech recognition texts:Obtain multiple languages
Probable value is less than the first speech recognition text of default probability threshold value with the difference of most probable value in sound identification text, by this
First speech recognition text and the corresponding speech recognition text of most probable value are defined as at least two candidates identification text.
Will most probable value be compared with other probable values, difference be less than default probability threshold value when, will just be compared
Compared with the corresponding speech recognition text of probable value as candidate recognize text.If difference is more than or equal to default probability threshold
Value, illustrates that probability of the corresponding speech recognition text of compared probable value as target identification text is very low, not further
Analysis.
It is alternatively possible to the probable value of multiple speech recognition texts is ranked up, default of select probability value highest
Number speech recognition text recognizes text as candidate.Can also from high in the end, according to the probability of two neighboring speech recognition text
Value difference value selects candidate to recognize text successively, for example, the probable value of maximum is more than predetermined threshold value with the difference of the second high probability values,
So just do not continue to compare directly using probable value highest speech recognition text as target identification text;Otherwise, by probability
Value highest speech recognition text and the speech recognition text high of probable value second all first recognize text as candidate, true successively
The difference of fixed second high probability values and next probable value, and determine that candidate recognizes text, the like, it is more than to certain difference
During predetermined threshold value, just no longer compare.Certainly, it is not limited in such ways, flexibly can as needed determines candidate's identification text
This, it would however also be possible to employ formula or algorithm are obtained.
If only determining a candidate speech identification text, this candidate speech identification text can be directly configured to
Target voice recognizes text.If multiple candidate speech recognize text, then the knot best suited with actual conditions is further determined that
Fruit recognizes text as target voice.
Alternatively, the similarity between the text of the correspondence position for calculating identification text to be determined and targeted contrast text,
Can include:Using default term vector model, the text of identification text to be determined and the correspondence position of targeted contrast text is determined
Between semantic similarity.
Wherein, presetting term vector model is used for by the semantic similarity between term vector range marker vocabulary.
Default term vector model can be trained by term vector and obtained, can be specifically word content is changed into it is limited low
The real number vector of dimension, dimension ties up relatively common with 50 peacekeepings 100.The distance of vector can be weighed with most traditional Euclidean distance
Amount, it is also possible to weighed with cosine angle, this is not restricted.The distance of vector reflects the distance of phrase semantic, i.e., between word
Semantic similarity can with vector distance represent.Term vector training can be carried out using the training tool of some term vectors,
The training corpus of the basic word that can be covered comprehensively in Chinese is obtained first, and is accordingly pre-processed;Then term vector is called
Training tool be trained, generate vector representation form, such as in language material each word have one it is corresponding 50 dimension to
Amount represents that this is not restricted.Vector distance is bigger, the semantic distance between word farther out, conversely, semantic distance is nearer.
Specifically, the identification text to be determined of candidate's identification text and the text of the correspondence position of targeted contrast text, go out
Now in same sentence pattern, and position is the same, then be same class things possibility it is very big, then according further to word
Vector distance determines similarity.
The explanation by taking table 1 as an example:
Table 1
It can be seen that, " Gao Shengmei " is closest with the term vector of " Zhou Jielun ", then match somebody with somebody " I wants to listen the song of Gao Shengmei "
Target identification text is set to, and target identification text output is shown to user, if the voice messaging of control instruction class, can
Related instruction is performed with according to target identification text, is not repeated one by one herein.
Alternatively, using default term vector model, the correspondence position of identification text to be determined and targeted contrast text is determined
Text between semantic similarity, Ke Yiwei:When identification text to be determined includes at least two vocabulary, using default word
Vector model, determines in identification text to be determined in each vocabulary and targeted contrast text between the vocabulary of correspondence position respectively
Semantic similarity.
Vocabulary i.e. respectively to diverse location is compared, such as " it is beneficial that breakfast eats fruit to identification text more to be determined
It is healthy " it is semantic similar between the text of the correspondence position of targeted contrast text " coarse food grain body health benefits are eaten in dinner "
Degree, can respectively determine the semantic phase between the semantic similarity between " breakfast " and " dinner ", and " coarse food grain " and " fruit "
Like degree.
The dress that target identification text is determined from least two candidates identification text that Fig. 3 is provided for the embodiment of the application one
Structural representation is put, as shown in figure 3, the device includes:First determining module 301, computing module 302 and second determine mould
Block 303, wherein:
First determining module 301, for determining speech data to be identified corresponding at least two candidate identification text in
It is determined that identification text and identification text to be determined.
Wherein, it is described to determine that identification text is identical part in candidate's identification text described at least two, it is described to treat really
Surely identification text is that candidate described at least two recognizes the part differed in text.
Computing module 302, the text for calculating the identification text to be determined and the correspondence position of targeted contrast text
Between similarity.
Wherein, the targeted contrast text be pre-set text storehouse in the candidate identification text sentence pattern structure it is consistent
Text, and the targeted contrast text includes the determination identification text.
Second determining module 303, for by the maximum in the similarity it is corresponding it is described it is to be determined identification text with
The candidate identification text for determining identification text composition, is configured to target identification text.
In the present embodiment, the first determining module 301 first determines that corresponding at least two candidate of speech data to be identified knows
Determination identification text and identification text to be determined in other text, then computing module 302 is for identification text to be determined, calculating
Similarity between the text of the correspondence position of identification text to be determined and targeted contrast text, by the maximum pair in similarity
The identification text to be determined answered is defined as the corresponding correct result of speech data to be identified, and then the second determining module 302 should
Identification text to be determined and the candidate's identification text for determining identification text composition, are configured to target identification text, realize and are obtaining
Get multiple probability it is close candidate identification text when, according to the targeted contrast text consistent with its sentence pattern structure, further root
According to the similarity between the text of correspondence position in identification text to be determined and targeted contrast text, determine and user input
The immediate identification text to be determined of speech data, and then by the identification text to be determined and determine that identification text constitutes mesh together
Other text is identified, user is fed back to, i.e., by referring to targeted contrast text, in candidate's identification text close to multiple probability
Different piece is further selected, and improves the accuracy for recognizing speech data to be identified, improves the Consumer's Experience of speech recognition
Property.
Fig. 4 determines target identification text for what another embodiment of the application was provided from least two candidates identification text
Apparatus structure schematic diagram, as shown in figure 4, on the basis of Fig. 3, the device also includes:3rd determining module 401, wherein:
3rd determining module 401, for determining speech data to be identified corresponding at least two in the first determining module 301
Before determination identification text and identification text to be determined in candidate's identification text, determine that the speech data to be identified is corresponding
Most probable value and the second greatest in multiple speech recognition texts.
In the present embodiment, the first determining module 301, the difference between the most probable value and second greatest
When value is less than default probability threshold value, determine that at least two candidates recognize text from the multiple speech recognition text.
Alternatively, the first determining module 301, specifically for obtain in the multiple speech recognition text probable value with it is described
First speech recognition text of the difference of most probable value less than default probability threshold value;By the first speech recognition text and
The corresponding speech recognition text of the most probable value is defined as at least two candidates identification text.
Further, computing module 302, specifically for using default term vector model, determining the identification text to be determined
Originally the semantic similarity and in the targeted contrast text between the text of correspondence position.Wherein, the default term vector model
For by the semantic similarity between term vector range marker vocabulary.
Alternatively, computing module 302, specifically for when the identification text to be determined includes at least two vocabulary, adopting
With the default term vector model, each vocabulary is corresponding with targeted contrast text during the identification text to be determined is determined respectively
Semantic similarity between the vocabulary of position.
It should be noted that:The device that above-described embodiment provides the identification text that sets the goal really is known from least two candidates
When determining target identification text in other text, only carried out with the division of above-mentioned each functional module for example, in practical application, can
To be completed by different functional module as needed and by above-mentioned functions distribution, will device internal structure be divided into it is different
Functional module, to complete all or part of function described above.In addition, above-described embodiment provides the identification text that sets the goal really
This device belongs to same design with the embodiment of the method for determining target identification text, and it implements process and refers to method implementation
Example, repeats no more here.
In several embodiments provided herein, it should be understood that disclosed apparatus and method, can be by it
Its mode is realized.For example, device embodiment described above is only schematical, for example, the division of the unit, only
Only a kind of division of logic function, can there is other dividing mode when actually realizing, such as multiple units or component can be tied
Another system is closed or is desirably integrated into, or some features can be ignored, or do not perform.It is another, it is shown or discussed
Coupling each other or direct-coupling or communication connection can be the INDIRECT COUPLINGs or logical of device or unit by some interfaces
Letter connection, can be electrical, mechanical or other forms.
In addition, during each functional unit in the application each embodiment can be integrated in a processing unit, it is also possible to
It is that unit is individually physically present, it is also possible to which two or more units are integrated in a unit.Above-mentioned integrated list
Unit can both be realized in the form of hardware, it would however also be possible to employ hardware adds the form of SFU software functional unit to realize.
The above-mentioned integrated unit realized in the form of SFU software functional unit, can store and be deposited in an embodied on computer readable
In storage media.Above-mentioned SFU software functional unit storage is in a storage medium, including some instructions are used to so that a computer
Equipment (can be personal computer, server, or network equipment etc.) or processor (English:Processor this Shen) is performed
Please each embodiment methods described part steps.And foregoing storage medium includes:USB flash disk, mobile hard disk, read-only storage
(English:Read-Only Memory, referred to as:ROM), random access memory (English:Random Access Memory, letter
Claim:RAM), magnetic disc or CD etc. are various can be with the medium of store program codes.
Finally it should be noted that:Various embodiments above is only used to illustrate the technical scheme of the application, rather than its limitations;To the greatest extent
Pipe has been described in detail with reference to foregoing embodiments to the application, it will be understood by those within the art that:Its according to
The technical scheme described in foregoing embodiments can so be modified, or which part or all technical characteristic are entered
Row equivalent;And these modifications or replacement, the essence of appropriate technical solution is departed from each embodiment technology of the application
The scope of scheme.
Claims (10)
1. a kind of method that target identification text is determined in identification text from least two candidates, it is characterised in that including:
Determine determination identification text and the identification to be determined in the corresponding at least two candidates identification text of speech data to be identified
Text, wherein, it is described to determine that identification text is identical part, the knowledge to be determined in candidate's identification text described at least two
Other text is that candidate described at least two recognizes the part differed in text;
Similarity between the text of the correspondence position for calculating the identification text to be determined and targeted contrast text, wherein, institute
Targeted contrast text is stated to recognize the consistent text of the sentence pattern structure of text, and the target in pre-set text storehouse with the candidate
Contrast text includes the determination identification text;
The corresponding identification text to be determined of maximum in the similarity is determined into the institute that identification text is constituted with described
Candidate's identification text is stated, target identification text is configured to.
2. method according to claim 1, it is characterised in that the determination speech data to be identified corresponding at least two
Before determination identification text and identification text to be determined in candidate's identification text, methods described also includes:
Determine the most probable value and the second greatest in the corresponding multiple speech recognition texts of the speech data to be identified;
When the difference between the most probable value and second greatest is less than default probability threshold value, from described many
Determine that at least two candidates recognize text in individual speech recognition text.
3. method according to claim 1 and 2, it is characterised in that described to determine from the multiple speech recognition text
At least two candidates recognize text, including:
Probable value is less than default probability threshold value with the difference of the most probable value in obtaining the multiple speech recognition text
The first speech recognition text;
The first speech recognition text and the corresponding speech recognition text of the most probable value are defined as described at least two
Individual candidate recognizes text.
4. method according to claim 1, it is characterised in that calculating identification text and the targeted contrast to be determined
Similarity between the text of the correspondence position of text, specially:
Using default term vector model, the text of the identification text to be determined and the correspondence position of the targeted contrast text is determined
Semantic similarity between this, wherein, the default term vector model is used for by the semanteme between term vector range marker vocabulary
Similarity.
5. method according to claim 4, it is characterised in that described using default term vector model, it is determined that described treat really
Surely the semantic similarity between the text of correspondence position in text and the targeted contrast text is recognized, specially:
When the identification text to be determined includes at least two vocabulary, using the default term vector model, determine respectively described
Semantic similarity in identification text to be determined in each vocabulary and targeted contrast text between the vocabulary of correspondence position.
6. the device of target identification text is determined in a kind of identification text from least two candidates, it is characterised in that including:
First determining module, for determining the determination identification in the corresponding at least two candidates identification text of speech data to be identified
Text and identification text to be determined, wherein, it is described to determine that identification text is identical in candidate's identification text described at least two
Part, the identification text to be determined is that candidate described at least two recognizes the part differed in text;
Computing module, for calculating the phase between the identification text of the text with the correspondence position of targeted contrast text to be determined
Like degree, wherein, the targeted contrast text is text consistent with the sentence pattern structure of candidate identification text in pre-set text storehouse
This, and the targeted contrast text includes the determination identification text;
Second determining module, for by the maximum in the similarity it is corresponding it is described it is to be determined identification text and the determination
The candidate identification text of identification text composition, is configured to target identification text.
7. device according to claim 6, it is characterised in that described device also includes:3rd determining module;
3rd determining module, for determining corresponding at least two time of speech data to be identified in first determining module
Before determination identification text and identification text to be determined in choosing identification text, determine that the speech data to be identified is corresponding more
Most probable value and the second greatest in individual speech recognition text;
First determining module, is less than specifically for the difference between the most probable value and second greatest
During default probability threshold value, determine that at least two candidates recognize text from the multiple speech recognition text.
8. the device according to claim 6 or 7, it is characterised in that first determining module, it is described specifically for obtaining
Probable value is known with the difference of the most probable value less than the first voice of default probability threshold value in multiple speech recognition texts
Other text;By the first speech recognition text and the corresponding speech recognition text of the most probable value be defined as it is described at least
Two candidates recognize text.
9. device according to claim 6, it is characterised in that the computing module, specifically for using default term vector
Model, determines semantic similar between the identification text to be determined and the text of correspondence position in the targeted contrast text
Degree, wherein, the default term vector model is used for by the semantic similarity between term vector range marker vocabulary.
10. device according to claim 9, it is characterised in that the computing module, specifically in the knowledge to be determined
Other text include at least two vocabulary when, using the default term vector model, determine respectively it is described it is to be determined identification text in
Semantic similarity in each vocabulary and targeted contrast text between the vocabulary of correspondence position.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710127503.9A CN106782560B (en) | 2017-03-06 | 2017-03-06 | Method and device for determining target recognition text |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710127503.9A CN106782560B (en) | 2017-03-06 | 2017-03-06 | Method and device for determining target recognition text |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106782560A true CN106782560A (en) | 2017-05-31 |
CN106782560B CN106782560B (en) | 2020-06-16 |
Family
ID=58962349
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710127503.9A Active CN106782560B (en) | 2017-03-06 | 2017-03-06 | Method and device for determining target recognition text |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106782560B (en) |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107277645A (en) * | 2017-07-27 | 2017-10-20 | 广东小天才科技有限公司 | Error correction method and device for subtitle content |
CN107329843A (en) * | 2017-06-30 | 2017-11-07 | 百度在线网络技术(北京)有限公司 | Application program sound control method, device, equipment and storage medium |
CN107680585A (en) * | 2017-08-23 | 2018-02-09 | 海信集团有限公司 | A kind of Chinese word cutting method, Chinese word segmentation device and terminal |
CN108197102A (en) * | 2017-12-26 | 2018-06-22 | 百度在线网络技术(北京)有限公司 | A kind of text data statistical method, device and server |
CN108364655A (en) * | 2018-01-31 | 2018-08-03 | 网易乐得科技有限公司 | Method of speech processing, medium, device and computing device |
CN108417210A (en) * | 2018-01-10 | 2018-08-17 | 苏州思必驰信息科技有限公司 | A kind of word insertion language model training method, words recognition method and system |
CN109829704A (en) * | 2018-12-07 | 2019-05-31 | 创发科技有限责任公司 | Payment channel configuration method, device and computer readable storage medium |
CN109918680A (en) * | 2019-03-28 | 2019-06-21 | 腾讯科技(上海)有限公司 | Entity recognition method, device and computer equipment |
CN110188338A (en) * | 2018-02-23 | 2019-08-30 | 富士通株式会社 | The relevant method for identifying speaker of text and equipment |
CN110706707A (en) * | 2019-11-13 | 2020-01-17 | 百度在线网络技术(北京)有限公司 | Method, apparatus, device and computer-readable storage medium for voice interaction |
CN110705274A (en) * | 2019-09-06 | 2020-01-17 | 电子科技大学 | Fusion type word meaning embedding method based on real-time learning |
CN110853635A (en) * | 2019-10-14 | 2020-02-28 | 广东美的白色家电技术创新中心有限公司 | Speech recognition method, audio annotation method, computer equipment and storage device |
CN111667821A (en) * | 2020-05-27 | 2020-09-15 | 山西东易园智能家居科技有限公司 | Voice recognition system and recognition method |
CN111681670A (en) * | 2019-02-25 | 2020-09-18 | 北京嘀嘀无限科技发展有限公司 | Information identification method and device, electronic equipment and storage medium |
CN112614263A (en) * | 2020-12-30 | 2021-04-06 | 浙江大华技术股份有限公司 | Method and device for controlling gate, computer equipment and storage medium |
CN113158631A (en) * | 2019-12-20 | 2021-07-23 | 佳能株式会社 | Information processing apparatus, information processing method, and computer program |
CN113177114A (en) * | 2021-05-28 | 2021-07-27 | 重庆电子工程职业学院 | Natural language semantic understanding method based on deep learning |
CN113539270A (en) * | 2021-07-22 | 2021-10-22 | 阳光保险集团股份有限公司 | Position identification method and device, electronic equipment and storage medium |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003308094A (en) * | 2002-02-12 | 2003-10-31 | Advanced Telecommunication Research Institute International | Method for correcting recognition error place in speech recognition |
US20070118357A1 (en) * | 2005-11-21 | 2007-05-24 | Kas Kasravi | Word recognition using ontologies |
CN101655837A (en) * | 2009-09-08 | 2010-02-24 | 北京邮电大学 | Method for detecting and correcting error on text after voice recognition |
CN102999483A (en) * | 2011-09-16 | 2013-03-27 | 北京百度网讯科技有限公司 | Method and device for correcting text |
CN103699530A (en) * | 2012-09-27 | 2014-04-02 | 百度在线网络技术(北京)有限公司 | Method and equipment for inputting texts in target application according to voice input information |
CN104021786A (en) * | 2014-05-15 | 2014-09-03 | 北京中科汇联信息技术有限公司 | Speech recognition method and speech recognition device |
CN105374351A (en) * | 2014-08-12 | 2016-03-02 | 霍尼韦尔国际公司 | Methods and apparatus for interpreting received speech data using speech recognition |
CN105513586A (en) * | 2015-12-18 | 2016-04-20 | 百度在线网络技术(北京)有限公司 | Speech recognition result display method and speech recognition result display device |
CN105654946A (en) * | 2014-12-02 | 2016-06-08 | 三星电子株式会社 | Method and apparatus for speech recognition |
CN105869642A (en) * | 2016-03-25 | 2016-08-17 | 海信集团有限公司 | Voice text error correction method and device |
CN106326303A (en) * | 2015-06-30 | 2017-01-11 | 芋头科技(杭州)有限公司 | Spoken language semantic analysis system and method |
CN106469554A (en) * | 2015-08-21 | 2017-03-01 | 科大讯飞股份有限公司 | A kind of adaptive recognition methodss and system |
-
2017
- 2017-03-06 CN CN201710127503.9A patent/CN106782560B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003308094A (en) * | 2002-02-12 | 2003-10-31 | Advanced Telecommunication Research Institute International | Method for correcting recognition error place in speech recognition |
US20070118357A1 (en) * | 2005-11-21 | 2007-05-24 | Kas Kasravi | Word recognition using ontologies |
CN101655837A (en) * | 2009-09-08 | 2010-02-24 | 北京邮电大学 | Method for detecting and correcting error on text after voice recognition |
CN102999483A (en) * | 2011-09-16 | 2013-03-27 | 北京百度网讯科技有限公司 | Method and device for correcting text |
CN103699530A (en) * | 2012-09-27 | 2014-04-02 | 百度在线网络技术(北京)有限公司 | Method and equipment for inputting texts in target application according to voice input information |
CN104021786A (en) * | 2014-05-15 | 2014-09-03 | 北京中科汇联信息技术有限公司 | Speech recognition method and speech recognition device |
CN105374351A (en) * | 2014-08-12 | 2016-03-02 | 霍尼韦尔国际公司 | Methods and apparatus for interpreting received speech data using speech recognition |
CN105654946A (en) * | 2014-12-02 | 2016-06-08 | 三星电子株式会社 | Method and apparatus for speech recognition |
CN106326303A (en) * | 2015-06-30 | 2017-01-11 | 芋头科技(杭州)有限公司 | Spoken language semantic analysis system and method |
CN106469554A (en) * | 2015-08-21 | 2017-03-01 | 科大讯飞股份有限公司 | A kind of adaptive recognition methodss and system |
CN105513586A (en) * | 2015-12-18 | 2016-04-20 | 百度在线网络技术(北京)有限公司 | Speech recognition result display method and speech recognition result display device |
CN105869642A (en) * | 2016-03-25 | 2016-08-17 | 海信集团有限公司 | Voice text error correction method and device |
Non-Patent Citations (1)
Title |
---|
马刚: "《基于语义的Web数据挖掘》", 31 January 2014, 东北财经大学出版社 * |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107329843A (en) * | 2017-06-30 | 2017-11-07 | 百度在线网络技术(北京)有限公司 | Application program sound control method, device, equipment and storage medium |
CN107277645A (en) * | 2017-07-27 | 2017-10-20 | 广东小天才科技有限公司 | Error correction method and device for subtitle content |
CN107680585A (en) * | 2017-08-23 | 2018-02-09 | 海信集团有限公司 | A kind of Chinese word cutting method, Chinese word segmentation device and terminal |
CN108197102A (en) * | 2017-12-26 | 2018-06-22 | 百度在线网络技术(北京)有限公司 | A kind of text data statistical method, device and server |
US10984031B2 (en) | 2017-12-26 | 2021-04-20 | Baidu Online Network Technology (Beijing) Co., Ltd. | Text analyzing method and device, server and computer-readable storage medium |
CN108417210B (en) * | 2018-01-10 | 2020-06-26 | 苏州思必驰信息科技有限公司 | Word embedding language model training method, word recognition method and system |
CN108417210A (en) * | 2018-01-10 | 2018-08-17 | 苏州思必驰信息科技有限公司 | A kind of word insertion language model training method, words recognition method and system |
CN108364655A (en) * | 2018-01-31 | 2018-08-03 | 网易乐得科技有限公司 | Method of speech processing, medium, device and computing device |
CN108364655B (en) * | 2018-01-31 | 2021-03-09 | 网易乐得科技有限公司 | Voice processing method, medium, device and computing equipment |
CN110188338B (en) * | 2018-02-23 | 2023-02-21 | 富士通株式会社 | Text-dependent speaker verification method and apparatus |
CN110188338A (en) * | 2018-02-23 | 2019-08-30 | 富士通株式会社 | The relevant method for identifying speaker of text and equipment |
CN109829704A (en) * | 2018-12-07 | 2019-05-31 | 创发科技有限责任公司 | Payment channel configuration method, device and computer readable storage medium |
CN111681670B (en) * | 2019-02-25 | 2023-05-12 | 北京嘀嘀无限科技发展有限公司 | Information identification method, device, electronic equipment and storage medium |
CN111681670A (en) * | 2019-02-25 | 2020-09-18 | 北京嘀嘀无限科技发展有限公司 | Information identification method and device, electronic equipment and storage medium |
CN109918680A (en) * | 2019-03-28 | 2019-06-21 | 腾讯科技(上海)有限公司 | Entity recognition method, device and computer equipment |
CN110705274A (en) * | 2019-09-06 | 2020-01-17 | 电子科技大学 | Fusion type word meaning embedding method based on real-time learning |
CN110705274B (en) * | 2019-09-06 | 2023-03-24 | 电子科技大学 | Fusion type word meaning embedding method based on real-time learning |
CN110853635B (en) * | 2019-10-14 | 2022-04-01 | 广东美的白色家电技术创新中心有限公司 | Speech recognition method, audio annotation method, computer equipment and storage device |
CN110853635A (en) * | 2019-10-14 | 2020-02-28 | 广东美的白色家电技术创新中心有限公司 | Speech recognition method, audio annotation method, computer equipment and storage device |
US11393490B2 (en) | 2019-11-13 | 2022-07-19 | Baidu Online Network Technology (Beijing) Co., Ltd. | Method, apparatus, device and computer-readable storage medium for voice interaction |
CN110706707A (en) * | 2019-11-13 | 2020-01-17 | 百度在线网络技术(北京)有限公司 | Method, apparatus, device and computer-readable storage medium for voice interaction |
CN113158631A (en) * | 2019-12-20 | 2021-07-23 | 佳能株式会社 | Information processing apparatus, information processing method, and computer program |
CN111667821A (en) * | 2020-05-27 | 2020-09-15 | 山西东易园智能家居科技有限公司 | Voice recognition system and recognition method |
CN112614263A (en) * | 2020-12-30 | 2021-04-06 | 浙江大华技术股份有限公司 | Method and device for controlling gate, computer equipment and storage medium |
CN113177114A (en) * | 2021-05-28 | 2021-07-27 | 重庆电子工程职业学院 | Natural language semantic understanding method based on deep learning |
CN113539270A (en) * | 2021-07-22 | 2021-10-22 | 阳光保险集团股份有限公司 | Position identification method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN106782560B (en) | 2020-06-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106782560A (en) | Determine the method and device of target identification text | |
JP7280382B2 (en) | End-to-end automatic speech recognition of digit strings | |
Audhkhasi et al. | Direct acoustics-to-word models for english conversational speech recognition | |
US10134388B1 (en) | Word generation for speech recognition | |
CN104681036B (en) | A kind of detecting system and method for language audio | |
Kim et al. | Two-stage multi-intent detection for spoken language understanding | |
US20240153505A1 (en) | Proactive command framework | |
CN109637537B (en) | Method for automatically acquiring annotated data to optimize user-defined awakening model | |
EP4018437B1 (en) | Optimizing a keyword spotting system | |
US8812315B2 (en) | System and method for adapting automatic speech recognition pronunciation by acoustic model restructuring | |
CN111191016A (en) | Multi-turn conversation processing method and device and computing equipment | |
CN108510985A (en) | System and method for reducing the principle sexual deviation in production speech model | |
US20020120447A1 (en) | Speech processing system | |
CN111210807B (en) | Speech recognition model training method, system, mobile terminal and storage medium | |
Yu et al. | Sequential labeling using deep-structured conditional random fields | |
US9704483B2 (en) | Collaborative language model biasing | |
US11093110B1 (en) | Messaging feedback mechanism | |
CN104166462A (en) | Input method and system for characters | |
Deena et al. | Recurrent neural network language model adaptation for multi-genre broadcast speech recognition and alignment | |
CN109976702A (en) | A kind of audio recognition method, device and terminal | |
Deng et al. | Improving accent identification and accented speech recognition under a framework of self-supervised learning | |
Ahmed et al. | End-to-end lexicon free arabic speech recognition using recurrent neural networks | |
Karunanayake et al. | Sinhala and tamil speech intent identification from english phoneme based asr | |
CN116303966A (en) | Dialogue behavior recognition system based on prompt learning | |
CN111508497B (en) | Speech recognition method, device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |