CN110415705A

CN110415705A - A kind of hot word recognition methods, system, device and storage medium

Info

Publication number: CN110415705A
Application number: CN201910706314.6A
Authority: CN
Inventors: 王欢良; 唐浩元; 王佳珺; 鄢戈; 张李
Original assignee: Suzhou Qdreamer Network Science And Technology Co Ltd
Current assignee: Suzhou Qdreamer Network Science And Technology Co Ltd
Priority date: 2019-08-01
Filing date: 2019-08-01
Publication date: 2019-11-05
Anticipated expiration: 2039-08-01
Also published as: CN110415705B

Abstract

The present invention provides a kind of hot word recognition methods, system, device and storage mediums, to solve the problems, such as in the prior art can accidentally modify correct speech recognition result, hot word recognition methods is the following steps are included: step 1: audio user is sent into universal identification engine, speech recognition result is obtained, while obtaining sound recognition result W_iCorresponding position and confidence level in audio；Step 2: audio user being sent into hot word detecting and alarm and carries out hot word retrieval, hot word W, the hot word corresponding audio position P and score S of highest scoring is obtained, is expressed as (W, P, S)；Step 3: judging the score S of the hot word (W, P, S) of highest scoring, if score S is greater than given threshold value, replace speech recognition result W with hot word W_i~W_jThe word of middle corresponding audio position executes step 4；Otherwise, terminate；Step 4: if the word in hot word appearance position and current recognition result has overlapping, the word before and after hot word being modified.

Description

A kind of hot word recognition methods, system, device and storage medium

Technical field

The present invention relates to technical field of voice recognition, and in particular to a kind of hot word recognition methods, system, device and storage are situated between Matter.

Background technique

Speech recognition technology has become the major technique of current manual's intelligent use.Typical speech recognition technology be all according to Rely specific vocabulary, that is to say, that can only identify the word within the scope of given vocabulary；If occurring the outer word of vocabulary in voice, usually Recognition performance can be very poor, or even basic identification does not come out.In response to this problem, there has been proposed some solutions.Main method Referred to as recognition result post-processing technology is analyzed by the text to recognition result, then uses language model or given Hot word pronounce to correct recognition result.Such methods have a fatal defects, that is, often correct recognition result is missed Modification.

Summary of the invention

In view of the above-mentioned problems, the present invention provides a kind of hot word recognition methods, system, device and storage medium, to solve The problem of correct speech recognition result accidentally being modified in the prior art.

Its technical solution is such that a kind of hot word recognition methods, which comprises the following steps:

Step 1: audio user being sent into universal identification engine, obtains speech recognition result, sound recognition result is expressed as W₁, W₂,...,W_n, wherein n is natural number, while obtaining sound recognition result W_iCorresponding position and confidence level in audio, wherein 1 ≤ i≤n, i are natural number；

Step 2: audio user being sent into hot word detecting and alarm and carries out hot word retrieval, obtains hot word W, the hot word of highest scoring Corresponding audio position P and score S, is expressed as (W, P, S)；

Step 3: judging the score S of the hot word (W, P, S) of highest scoring, if score S is greater than given threshold value, use hot word W Replace speech recognition result W_I~W_jThe word of middle corresponding audio position executes step 4；Otherwise, terminate；

Step 4: if the word in hot word appearance position and current recognition result has overlapping, the word before and after hot word being repaired Just.

It further, further include step 1.5 between step 1 and step 2: speech recognition result W if it exists_I~W_j, i < j, J is natural number, W_I~W_jAverage confidence be lower than given threshold value, then extract W_I~W_jCorresponding audio fragment executes step 2.

Further, step 1 is synchronous with step 2 executes.

Further, step 2 specifically comprises the following steps:

Step 2-1: according to hot word vocabulary, a filler word is added, filler word is configured to connect all acoustics and builds Form unit, the parallel grammer of construction one identify network；

Step 2-2: the input sound bite of extraction is carried out using the viterbi algorithm of beam-search beam-search Decoding search；

Step 2-3: backtracking obtains the hot word and the corresponding audio position of hot word of highest scoring；

Step 2-4: it by calculating average posterior probability to the corresponding speech frame of hot word, is exported as the hot word score.

Further, in step 2, it is obtained in grammer identification network using the posterior probability of universal identification acoustic model output Point.

Further, in step 4, it includes initial position that the word in hot word appearance position and current recognition result, which has overlapping, It is overlapping and end position overlapping.

Further, when the word in hot word appearance position and current recognition result have initial position it is overlapping when, step 4 tool Body includes the following steps:

Step 4-1: determining the word in recognition result where hot word original position, calculates the word initial position and hot word The alternate position spike of initial position；

Step 4-2: if alternate position spike is greater than the duration of a word, the word and hot word not overlapping part are selected from vocabulary In, the similar word of word pronunciation is as candidate word；

Step 4-3: using preparatory trained language model, word and current before predicting given beginning of the sentence word to current word After word under the conditions of word, the probability that each candidate word of current word occurs, and the score as the candidate word；

Step 4-4: it if the score of the candidate word of highest scoring is greater than given threshold value, is replaced with the candidate word current Word；Otherwise keep current word constant；

When the word in hot word appearance position and current recognition result have end position it is overlapping when, step 4 specifically include as Lower step:

Step 4.1: determine hot word end position place in recognition result word, calculate the end position and hot word of the word End position alternate position spike；

Step 4.2: if alternate position spike is greater than the duration of a word, the word and hot word not overlapping part are selected from vocabulary In, the similar word of word pronunciation is as candidate word；

Step 4.3: using preparatory trained language model, predicting after given sentence tail word to current word word and current Before word under the conditions of word, the probability that each candidate word of current word occurs, and the score as the candidate word；

Step 4.4: if the score of the candidate word of highest scoring is greater than given threshold value, being replaced with the candidate word current Word；Otherwise keep current word constant.

Further, between step 4-3 and step 4-5, and between step 4.3 and step 4.5, further include respectively Following steps: increasing the acoustic confidence information of each word, then predicts the probability that each candidate word of current word occurs, and make For the score of the candidate word.

Further, hot word detecting and alarm is configured to corresponding with User ID, uploads hot word and use when adding hot word simultaneously Family ID inquires pronunciation dictionary, obtains the pronunciation of hot word and its to sound aligned phoneme sequence；Then grammer network is added in hot word；It generates Hot word detects resource, and hot word correspondence is added to hot word detecting and alarm corresponding with User ID.

A kind of hot word identifying system characterized by comprising

Universal phonetic identifies engine, is configured to export the time location of speech recognition result and each word in audio And confidence level；

Hot word detecting and alarm is configured to detect whether to export ID, audio position and its score of hot word there are hot word；

Hot word modified result module is configured to know using the voice of hot word replacement universal phonetic identification engine output The word of corresponding position in other result；

Language model modified result module is configured to have friendship as the word in hot word appearance position and current recognition result Word before and after hot word is modified when folded.

Further, further include hot word adding module, be configured to add hot word to the hot word detecting and alarm.

A kind of hot word identification device, characterized in that it comprises: including processor, memory and program；

Described program stores in the memory, and the processor calls the program of memory storage, above-mentioned to execute Hot word recognition methods.

A kind of computer readable storage medium, which is characterized in that the computer readable storage medium is configured to store Program, described program are configured to execute above-mentioned hot word recognition methods.

Hot word recognition methods of the invention identifies hot word using word detection scheme is waken up, hot word can user by making by oneself Justice, identify hot word after hot word result is modified, in addition, also detect hot word and it is modified on the basis of, can be further Other identification mistakes caused by modifying factor hot word identification mistake, amendment and hot word have those of overlapping or neighbouring low confidence Word, hot word recognition methods of the invention, significantly reduces and accidentally corrects mistake, and user can be with the hot word of fast custom oneself, hot word Detection accuracy is high, ensure that modified high-accuracy.

Detailed description of the invention

Fig. 1 is the flow chart of the hot word recognition methods of specific embodiment 1；

Fig. 2 is the system block diagram of the hot word identifying system of specific embodiment 1；

Fig. 3 is the flow chart of the hot word recognition methods of specific embodiment 2；

Fig. 4 is the system block diagram of the hot word identifying system of specific embodiment 2.

Specific embodiment

Specific embodiment 1: see Fig. 1, a kind of hot word recognition methods, comprising the following steps:

Step 1.5: speech recognition result W if it exists_I~W_j, i < j, j are natural number, W_I~W_jAverage confidence lower than giving Determine threshold value, threshold value takes 0.5, then extracts W_I~W_jCorresponding audio fragment executes step 2；Otherwise, terminate；

Step 3: judge the score S of the hot word (W, P, S) of highest scoring, if score S is greater than given threshold value, threshold value takes 0.5, Then speech recognition result W is replaced with hot word W_I~W_jThe word of middle corresponding audio position executes step 4；Otherwise, terminate；

Specifically, step 2 specifically comprises the following steps:

In the present embodiment, in step 2, general using the posteriority of universal identification acoustic model output in grammer identification network Rate score.

Specifically, in step 4, it includes initial position that the word in hot word appearance position and current recognition result, which has overlapping, It is overlapping and end position overlapping.

When the word in hot word appearance position and current recognition result have initial position it is overlapping when, step 4 specifically include as Lower step:

Increase the acoustic confidence information of each word, then predicts the probability that each candidate word of current word occurs, and make For the score of the candidate word；

Step 4-4: if the score of the candidate word of highest scoring is greater than given threshold value, threshold value takes 0.5, then with the candidate word Replace current word；Otherwise keep current word constant；

Increase the acoustic confidence information of each word, then predicts the probability that each candidate word of current word occurs, and make For the score of the candidate word

Step 4.4: if the score of the candidate word of highest scoring is greater than given threshold value, threshold value takes 0.5, then with the candidate word Replace current word；Otherwise keep current word constant.

Specifically, in the present embodiment, the language model used in step 4 is recurrent neural network, specially LSTM/ GRU class language model.

In the present embodiment, hot word detecting and alarm is configured to corresponding with User ID to realize that user relies on, and adds hot word When upload hot word and User ID simultaneously, inquire pronunciation dictionary, obtain the pronunciation of hot word and its to sound aligned phoneme sequence；Then hot word Grammer network is added；It generating hot word and detects resource, hot word correspondence is added in hot word detecting and alarm corresponding with User ID, so that Hot word can be conveniently added in hot word detecting and alarm.

Specifically, user can transmit a triple to system, system is told to add or delete given hot word.Three Tuple is defined as follows: (ID, HotWord, OPT), wherein ID: mark user；HotWord: mark hot word；OPT: mark is dynamic Make, OPT is defined as adding or deleting.

Hot word identifying system corresponding with specific embodiment 1 is shown in Fig. 2, comprising:

Universal phonetic identifies engine 1, is configured to export the when meta position of speech recognition result and each word in audio It sets and confidence level；

Hot word detecting and alarm 2 is configured to detect whether to export ID, audio position and its score of hot word there are hot word；

Hot word modified result module 3 is configured to the voice using hot word replacement universal phonetic identification engine output The word of corresponding position in recognition result；

Language model modified result module 4 is configured to have as the word in hot word appearance position and current recognition result Word before and after hot word is modified when overlapping.

Further include hot word adding module 5, is configured to add hot word to hot word detecting and alarm.

In specific embodiment 1, knows otherwise as shown in Fig. 2, providing a kind of sequence, first do universal identification, according to The confidence level of recognition result, then hot word detection is done, additional computing resource is not needed, system delay will increase.

Specific embodiment 2: see Fig. 2, a kind of hot word recognition methods, comprising the following steps:

In the present embodiment, step 1 is synchronous with step 2 executes.

Specifically, step 2 specifically comprises the following steps:

In the present embodiment, in step 2, the acoustic model in grammer identification network is CLDNN model.

Hot word identifying system corresponding with specific embodiment 2 is shown in Fig. 4, comprising:

In specific embodiment 2, as shown in figure 4, providing the mode of parallelism recognition, universal identification and hot word detection are simultaneously It does, it is desirable that have more rich computing resource, system delay is basically unchanged, and response speed is faster.

Hot word recognition methods of the invention identifies hot word using word detection scheme is waken up, hot word can user by making by oneself Justice, identify hot word after hot word result is modified, in addition, also detect hot word and it is modified on the basis of, can be further Other identification mistakes, hot word recognition methods of the invention caused by the identification of modifying factor hot word is wrong significantly reduce and accidentally correct mistake Accidentally, user can be with the hot word of fast custom oneself, and hot word Detection accuracy is high, ensure that modified high-accuracy.

In an embodiment of the present invention, a kind of hot word identification device is additionally provided comprising: including processor, memory And program；Program stores in memory, and processor calls the program of memory storage, to execute above-mentioned hot word identification side Method.

In the realization of above-mentioned hot word identification device, directly or indirectly it is electrically connected between memory and processor, with Realize the transmission or interaction of data.For example, these elements can pass through one or more of communication bus or signal between each other Line, which is realized, to be electrically connected, and can such as be connected by bus.The computer for realizing data access control method is stored in memory It executes instruction, including the software function module that at least one can be stored in memory in the form of software or firmware, processor By running the software program and module that are stored in memory, thereby executing various function application and data processing.

Memory may be, but not limited to, random access memory (Random Access Memory, referred to as: RAM), Read-only memory (Read Only Memory, referred to as: ROM), programmable read only memory (Programmable Read-Only Memory, referred to as: PROM), erasable read-only memory (Erasable Programmable Read-Only Memory, letter Claim: EPROM), electricallyerasable ROM (EEROM) (Electric Erasable Programmable Read-Only Memory, Referred to as: EEPROM) etc..Wherein, memory is for storing program, and processor executes program after receiving and executing instruction.

Processor can be a kind of IC chip, the processing capacity with signal.Above-mentioned processor can be logical With processor, including central processing unit (Central Processing Unit, referred to as: CPU), network processing unit (Network Processor, referred to as: NP) etc..It may be implemented or execute disclosed each method, step and the logic in the embodiment of the present application Block diagram.General processor can be microprocessor or the processor is also possible to any conventional processor etc..

In an embodiment of the present invention, a kind of computer readable storage medium, computer readable storage medium are additionally provided It is configured to store program, program is configured to execute above-mentioned hot word recognition methods.

Those of ordinary skill in the art will appreciate that: realize that all or part of the steps of above-mentioned each method embodiment can lead to The relevant hardware of program instruction is crossed to complete.Program above-mentioned can store in computer readable storage medium.The program exists When being executed by processor, realization includes the steps that above-mentioned each method embodiment；And computer readable storage medium above-mentioned includes: The various media that can store program code such as ROM, RAM, magnetic or disk, including some instructions are used so that one big number Each embodiment or embodiment are executed according to transmission device (can be personal computer, server or the network equipment etc.) Method described in certain parts.

Claims

1. a kind of hot word recognition methods, which comprises the following steps:

Step 2: audio user being sent into hot word detecting and alarm and carries out hot word retrieval, hot word W, the hot word for obtaining highest scoring are corresponding Audio position P and score S, be expressed as (W, P, S)；

Step 3: judging the score S of the hot word (W, P, S) of highest scoring, if score S is greater than given threshold value, replaced with hot word W The word of audio position is corresponded in speech recognition result, executes step 4；Otherwise, terminate；

Step 4: if the word in hot word appearance position and current recognition result has overlapping, the word before and after hot word being modified.

2. a kind of hot word recognition methods according to claim 1, it is characterised in that: between step 1 and step 2, also wrap Include step 1.5: speech recognition result W if it exists_i~W_j, i < j, i, j is natural number, W_i~W_jAverage confidence lower than given Threshold value then extracts W_i~W_jCorresponding audio fragment executes step 2.

3. a kind of hot word recognition methods according to claim 1, it is characterised in that: step 1 is synchronous with step 2 to be executed.

4. a kind of hot word recognition methods according to claim 1, it is characterised in that: step 2 specifically comprises the following steps:

Step 2-1: according to hot word vocabulary, a filler word is added, filler word is configured to connect all Acoustic Modeling lists Member, the parallel grammer of construction one identify network；

Step 2-2: the input sound bite of extraction is decoded using the viterbi algorithm of beam-search beam-search Search；

5. a kind of hot word recognition methods according to claim 4, it is characterised in that: in step 2, grammer identifies in network The posterior probability score exported using universal identification acoustic model.

6. a kind of hot word recognition methods according to claim 1, it is characterised in that: in step 4, hot word appearance position and Word in current recognition result has overlapping overlapping and end position overlapping including initial position；

When the word in hot word appearance position and current recognition result have initial position it is overlapping when, step 4 specifically includes following step It is rapid:

Step 4-1: determining the word in recognition result where hot word original position, calculates the starting of the word initial position and hot word The alternate position spike of position；

Step 4-2: if alternate position spike is greater than the duration of a word, from selecting the word and hot word in vocabulary not in overlapping part, word Language pronounces similar word as candidate word；

Step 4-3: using preparatory trained language model, word after word and current word is given before beginning of the sentence word to current word Under the conditions of, predict the probability that each candidate word of current word occurs, and the score as the candidate word；

Step 4-4: if the score of the candidate word of highest scoring is greater than given threshold value, current word is replaced with the candidate word；It is no Then keep current word constant；

When the word in hot word appearance position and current recognition result have end position it is overlapping when, step 4 specifically includes following step It is rapid:

Step 4.1: determine hot word end position place in recognition result word, calculate the end position of the word and the knot of hot word The alternate position spike of beam position；

Step 4.2: if alternate position spike is greater than the duration of a word, from selecting the word and hot word in vocabulary not in overlapping part, word Language pronounces similar word as candidate word；

Step 4.3: using preparatory trained language model, giving after sentence tail word to current word word before word and current word Under the conditions of, predict the probability that each candidate word of current word occurs, and the score as the candidate word；

Step 4.4: if the score of the candidate word of highest scoring is greater than given threshold value, replacing current word with the candidate word；It is no Then keep current word constant.

7. a kind of hot word recognition methods according to claim 6, it is characterised in that: between step 4-3 and step 4-5, It is further comprising the steps of respectively: to increase the acoustic confidence information of each word, then and between step 4.3 and step 4.5 Predict the probability that each candidate word of current word occurs, and the score as the candidate word.

8. a kind of hot word recognition methods according to claim 1, it is characterised in that: hot word detecting and alarm is configured to and uses Family ID is corresponding, uploads hot word and User ID when adding hot word simultaneously, inquires pronunciation dictionary, obtains the pronunciation of hot word and its to sound sound Prime sequences；Then grammer network is added in hot word；It generates hot word and detects resource, hot word correspondence is added to heat corresponding with User ID Word detecting and alarm.

9. a kind of hot word identifying system characterized by comprising

Universal phonetic identifies engine, is configured to export the time location of speech recognition result and each word in audio and sets Reliability；

Hot word modified result module is configured to the speech recognition knot using hot word replacement universal phonetic identification engine output The word of corresponding position in fruit；

Language model modified result module, when being configured to have overlapping as the word in hot word appearance position and current recognition result Word before and after hot word is modified.

10. a kind of hot word identifying system according to claim 9, it is characterised in that: further include hot word adding module, matched It is set to and hot word is added or updated to the hot word detecting and alarm.

11. a kind of hot word identification device, characterized in that it comprises: including processor, memory and program；

Described program stores in the memory, and the processor calls the program of memory storage, requires 1 with perform claim The hot word recognition methods.

12. a kind of computer readable storage medium, which is characterized in that the computer readable storage medium is configured to store journey Sequence, described program are configured to hot word recognition methods described in perform claim requirement 1.