CN110415705A - A kind of hot word recognition methods, system, device and storage medium - Google Patents

A kind of hot word recognition methods, system, device and storage medium Download PDF

Info

Publication number
CN110415705A
CN110415705A CN201910706314.6A CN201910706314A CN110415705A CN 110415705 A CN110415705 A CN 110415705A CN 201910706314 A CN201910706314 A CN 201910706314A CN 110415705 A CN110415705 A CN 110415705A
Authority
CN
China
Prior art keywords
word
hot word
hot
recognition result
current
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910706314.6A
Other languages
Chinese (zh)
Other versions
CN110415705B (en
Inventor
王欢良
唐浩元
王佳珺
鄢戈
张李
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Qdreamer Network Science And Technology Co Ltd
Original Assignee
Suzhou Qdreamer Network Science And Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Qdreamer Network Science And Technology Co Ltd filed Critical Suzhou Qdreamer Network Science And Technology Co Ltd
Priority to CN201910706314.6A priority Critical patent/CN110415705B/en
Publication of CN110415705A publication Critical patent/CN110415705A/en
Application granted granted Critical
Publication of CN110415705B publication Critical patent/CN110415705B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The present invention provides a kind of hot word recognition methods, system, device and storage mediums, to solve the problems, such as in the prior art can accidentally modify correct speech recognition result, hot word recognition methods is the following steps are included: step 1: audio user is sent into universal identification engine, speech recognition result is obtained, while obtaining sound recognition result WiCorresponding position and confidence level in audio;Step 2: audio user being sent into hot word detecting and alarm and carries out hot word retrieval, hot word W, the hot word corresponding audio position P and score S of highest scoring is obtained, is expressed as (W, P, S);Step 3: judging the score S of the hot word (W, P, S) of highest scoring, if score S is greater than given threshold value, replace speech recognition result W with hot word Wi~WjThe word of middle corresponding audio position executes step 4;Otherwise, terminate;Step 4: if the word in hot word appearance position and current recognition result has overlapping, the word before and after hot word being modified.

Description

A kind of hot word recognition methods, system, device and storage medium
Technical field
The present invention relates to technical field of voice recognition, and in particular to a kind of hot word recognition methods, system, device and storage are situated between Matter.
Background technique
Speech recognition technology has become the major technique of current manual's intelligent use.Typical speech recognition technology be all according to Rely specific vocabulary, that is to say, that can only identify the word within the scope of given vocabulary;If occurring the outer word of vocabulary in voice, usually Recognition performance can be very poor, or even basic identification does not come out.In response to this problem, there has been proposed some solutions.Main method Referred to as recognition result post-processing technology is analyzed by the text to recognition result, then uses language model or given Hot word pronounce to correct recognition result.Such methods have a fatal defects, that is, often correct recognition result is missed Modification.
Summary of the invention
In view of the above-mentioned problems, the present invention provides a kind of hot word recognition methods, system, device and storage medium, to solve The problem of correct speech recognition result accidentally being modified in the prior art.
Its technical solution is such that a kind of hot word recognition methods, which comprises the following steps:
Step 1: audio user being sent into universal identification engine, obtains speech recognition result, sound recognition result is expressed as W1, W2,...,Wn, wherein n is natural number, while obtaining sound recognition result WiCorresponding position and confidence level in audio, wherein 1 ≤ i≤n, i are natural number;
Step 2: audio user being sent into hot word detecting and alarm and carries out hot word retrieval, obtains hot word W, the hot word of highest scoring Corresponding audio position P and score S, is expressed as (W, P, S);
Step 3: judging the score S of the hot word (W, P, S) of highest scoring, if score S is greater than given threshold value, use hot word W Replace speech recognition result WI~WjThe word of middle corresponding audio position executes step 4;Otherwise, terminate;
Step 4: if the word in hot word appearance position and current recognition result has overlapping, the word before and after hot word being repaired Just.
It further, further include step 1.5 between step 1 and step 2: speech recognition result W if it existsI~Wj, i < j, J is natural number, WI~WjAverage confidence be lower than given threshold value, then extract WI~WjCorresponding audio fragment executes step 2.
Further, step 1 is synchronous with step 2 executes.
Further, step 2 specifically comprises the following steps:
Step 2-1: according to hot word vocabulary, a filler word is added, filler word is configured to connect all acoustics and builds Form unit, the parallel grammer of construction one identify network;
Step 2-2: the input sound bite of extraction is carried out using the viterbi algorithm of beam-search beam-search Decoding search;
Step 2-3: backtracking obtains the hot word and the corresponding audio position of hot word of highest scoring;
Step 2-4: it by calculating average posterior probability to the corresponding speech frame of hot word, is exported as the hot word score.
Further, in step 2, it is obtained in grammer identification network using the posterior probability of universal identification acoustic model output Point.
Further, in step 4, it includes initial position that the word in hot word appearance position and current recognition result, which has overlapping, It is overlapping and end position overlapping.
Further, when the word in hot word appearance position and current recognition result have initial position it is overlapping when, step 4 tool Body includes the following steps:
Step 4-1: determining the word in recognition result where hot word original position, calculates the word initial position and hot word The alternate position spike of initial position;
Step 4-2: if alternate position spike is greater than the duration of a word, the word and hot word not overlapping part are selected from vocabulary In, the similar word of word pronunciation is as candidate word;
Step 4-3: using preparatory trained language model, word and current before predicting given beginning of the sentence word to current word After word under the conditions of word, the probability that each candidate word of current word occurs, and the score as the candidate word;
Step 4-4: it if the score of the candidate word of highest scoring is greater than given threshold value, is replaced with the candidate word current Word;Otherwise keep current word constant;
When the word in hot word appearance position and current recognition result have end position it is overlapping when, step 4 specifically include as Lower step:
Step 4.1: determine hot word end position place in recognition result word, calculate the end position and hot word of the word End position alternate position spike;
Step 4.2: if alternate position spike is greater than the duration of a word, the word and hot word not overlapping part are selected from vocabulary In, the similar word of word pronunciation is as candidate word;
Step 4.3: using preparatory trained language model, predicting after given sentence tail word to current word word and current Before word under the conditions of word, the probability that each candidate word of current word occurs, and the score as the candidate word;
Step 4.4: if the score of the candidate word of highest scoring is greater than given threshold value, being replaced with the candidate word current Word;Otherwise keep current word constant.
Further, between step 4-3 and step 4-5, and between step 4.3 and step 4.5, further include respectively Following steps: increasing the acoustic confidence information of each word, then predicts the probability that each candidate word of current word occurs, and make For the score of the candidate word.
Further, hot word detecting and alarm is configured to corresponding with User ID, uploads hot word and use when adding hot word simultaneously Family ID inquires pronunciation dictionary, obtains the pronunciation of hot word and its to sound aligned phoneme sequence;Then grammer network is added in hot word;It generates Hot word detects resource, and hot word correspondence is added to hot word detecting and alarm corresponding with User ID.
A kind of hot word identifying system characterized by comprising
Universal phonetic identifies engine, is configured to export the time location of speech recognition result and each word in audio And confidence level;
Hot word detecting and alarm is configured to detect whether to export ID, audio position and its score of hot word there are hot word;
Hot word modified result module is configured to know using the voice of hot word replacement universal phonetic identification engine output The word of corresponding position in other result;
Language model modified result module is configured to have friendship as the word in hot word appearance position and current recognition result Word before and after hot word is modified when folded.
Further, further include hot word adding module, be configured to add hot word to the hot word detecting and alarm.
A kind of hot word identification device, characterized in that it comprises: including processor, memory and program;
Described program stores in the memory, and the processor calls the program of memory storage, above-mentioned to execute Hot word recognition methods.
A kind of computer readable storage medium, which is characterized in that the computer readable storage medium is configured to store Program, described program are configured to execute above-mentioned hot word recognition methods.
Hot word recognition methods of the invention identifies hot word using word detection scheme is waken up, hot word can user by making by oneself Justice, identify hot word after hot word result is modified, in addition, also detect hot word and it is modified on the basis of, can be further Other identification mistakes caused by modifying factor hot word identification mistake, amendment and hot word have those of overlapping or neighbouring low confidence Word, hot word recognition methods of the invention, significantly reduces and accidentally corrects mistake, and user can be with the hot word of fast custom oneself, hot word Detection accuracy is high, ensure that modified high-accuracy.
Detailed description of the invention
Fig. 1 is the flow chart of the hot word recognition methods of specific embodiment 1;
Fig. 2 is the system block diagram of the hot word identifying system of specific embodiment 1;
Fig. 3 is the flow chart of the hot word recognition methods of specific embodiment 2;
Fig. 4 is the system block diagram of the hot word identifying system of specific embodiment 2.
Specific embodiment
Specific embodiment 1: see Fig. 1, a kind of hot word recognition methods, comprising the following steps:
Step 1: audio user being sent into universal identification engine, obtains speech recognition result, sound recognition result is expressed as W1, W2,...,Wn, wherein n is natural number, while obtaining sound recognition result WiCorresponding position and confidence level in audio, wherein 1 ≤ i≤n, i are natural number;
Step 1.5: speech recognition result W if it existsI~Wj, i < j, j are natural number, WI~WjAverage confidence lower than giving Determine threshold value, threshold value takes 0.5, then extracts WI~WjCorresponding audio fragment executes step 2;Otherwise, terminate;
Step 2: audio user being sent into hot word detecting and alarm and carries out hot word retrieval, obtains hot word W, the hot word of highest scoring Corresponding audio position P and score S, is expressed as (W, P, S);
Step 3: judge the score S of the hot word (W, P, S) of highest scoring, if score S is greater than given threshold value, threshold value takes 0.5, Then speech recognition result W is replaced with hot word WI~WjThe word of middle corresponding audio position executes step 4;Otherwise, terminate;
Step 4: if the word in hot word appearance position and current recognition result has overlapping, the word before and after hot word being repaired Just.
Specifically, step 2 specifically comprises the following steps:
Step 2-1: according to hot word vocabulary, a filler word is added, filler word is configured to connect all acoustics and builds Form unit, the parallel grammer of construction one identify network;
Step 2-2: the input sound bite of extraction is carried out using the viterbi algorithm of beam-search beam-search Decoding search;
Step 2-3: backtracking obtains the hot word and the corresponding audio position of hot word of highest scoring;
Step 2-4: it by calculating average posterior probability to the corresponding speech frame of hot word, is exported as the hot word score.
In the present embodiment, in step 2, general using the posteriority of universal identification acoustic model output in grammer identification network Rate score.
Specifically, in step 4, it includes initial position that the word in hot word appearance position and current recognition result, which has overlapping, It is overlapping and end position overlapping.
When the word in hot word appearance position and current recognition result have initial position it is overlapping when, step 4 specifically include as Lower step:
Step 4-1: determining the word in recognition result where hot word original position, calculates the word initial position and hot word The alternate position spike of initial position;
Step 4-2: if alternate position spike is greater than the duration of a word, the word and hot word not overlapping part are selected from vocabulary In, the similar word of word pronunciation is as candidate word;
Step 4-3: using preparatory trained language model, word and current before predicting given beginning of the sentence word to current word After word under the conditions of word, the probability that each candidate word of current word occurs, and the score as the candidate word;
Increase the acoustic confidence information of each word, then predicts the probability that each candidate word of current word occurs, and make For the score of the candidate word;
Step 4-4: if the score of the candidate word of highest scoring is greater than given threshold value, threshold value takes 0.5, then with the candidate word Replace current word;Otherwise keep current word constant;
When the word in hot word appearance position and current recognition result have end position it is overlapping when, step 4 specifically include as Lower step:
Step 4.1: determine hot word end position place in recognition result word, calculate the end position and hot word of the word End position alternate position spike;
Step 4.2: if alternate position spike is greater than the duration of a word, the word and hot word not overlapping part are selected from vocabulary In, the similar word of word pronunciation is as candidate word;
Step 4.3: using preparatory trained language model, predicting after given sentence tail word to current word word and current Before word under the conditions of word, the probability that each candidate word of current word occurs, and the score as the candidate word;
Increase the acoustic confidence information of each word, then predicts the probability that each candidate word of current word occurs, and make For the score of the candidate word
Step 4.4: if the score of the candidate word of highest scoring is greater than given threshold value, threshold value takes 0.5, then with the candidate word Replace current word;Otherwise keep current word constant.
Specifically, in the present embodiment, the language model used in step 4 is recurrent neural network, specially LSTM/ GRU class language model.
In the present embodiment, hot word detecting and alarm is configured to corresponding with User ID to realize that user relies on, and adds hot word When upload hot word and User ID simultaneously, inquire pronunciation dictionary, obtain the pronunciation of hot word and its to sound aligned phoneme sequence;Then hot word Grammer network is added;It generating hot word and detects resource, hot word correspondence is added in hot word detecting and alarm corresponding with User ID, so that Hot word can be conveniently added in hot word detecting and alarm.
Specifically, user can transmit a triple to system, system is told to add or delete given hot word.Three Tuple is defined as follows: (ID, HotWord, OPT), wherein ID: mark user;HotWord: mark hot word;OPT: mark is dynamic Make, OPT is defined as adding or deleting.
Hot word identifying system corresponding with specific embodiment 1 is shown in Fig. 2, comprising:
Universal phonetic identifies engine 1, is configured to export the when meta position of speech recognition result and each word in audio It sets and confidence level;
Hot word detecting and alarm 2 is configured to detect whether to export ID, audio position and its score of hot word there are hot word;
Hot word modified result module 3 is configured to the voice using hot word replacement universal phonetic identification engine output The word of corresponding position in recognition result;
Language model modified result module 4 is configured to have as the word in hot word appearance position and current recognition result Word before and after hot word is modified when overlapping.
Further include hot word adding module 5, is configured to add hot word to hot word detecting and alarm.
In specific embodiment 1, knows otherwise as shown in Fig. 2, providing a kind of sequence, first do universal identification, according to The confidence level of recognition result, then hot word detection is done, additional computing resource is not needed, system delay will increase.
Specific embodiment 2: see Fig. 2, a kind of hot word recognition methods, comprising the following steps:
Step 1: audio user being sent into universal identification engine, obtains speech recognition result, sound recognition result is expressed as W1, W2,...,Wn, wherein n is natural number, while obtaining sound recognition result WiCorresponding position and confidence level in audio, wherein 1 ≤ i≤n, i are natural number;
Step 2: audio user being sent into hot word detecting and alarm and carries out hot word retrieval, obtains hot word W, the hot word of highest scoring Corresponding audio position P and score S, is expressed as (W, P, S);
Step 3: judge the score S of the hot word (W, P, S) of highest scoring, if score S is greater than given threshold value, threshold value takes 0.5, Then speech recognition result W is replaced with hot word WI~WjThe word of middle corresponding audio position executes step 4;Otherwise, terminate;
Step 4: if the word in hot word appearance position and current recognition result has overlapping, the word before and after hot word being repaired Just.
In the present embodiment, step 1 is synchronous with step 2 executes.
Specifically, step 2 specifically comprises the following steps:
Step 2-1: according to hot word vocabulary, a filler word is added, filler word is configured to connect all acoustics and builds Form unit, the parallel grammer of construction one identify network;
Step 2-2: the input sound bite of extraction is carried out using the viterbi algorithm of beam-search beam-search Decoding search;
Step 2-3: backtracking obtains the hot word and the corresponding audio position of hot word of highest scoring;
Step 2-4: it by calculating average posterior probability to the corresponding speech frame of hot word, is exported as the hot word score.
In the present embodiment, in step 2, the acoustic model in grammer identification network is CLDNN model.
Specifically, in step 4, it includes initial position that the word in hot word appearance position and current recognition result, which has overlapping, It is overlapping and end position overlapping.
When the word in hot word appearance position and current recognition result have initial position it is overlapping when, step 4 specifically include as Lower step:
Step 4-1: determining the word in recognition result where hot word original position, calculates the word initial position and hot word The alternate position spike of initial position;
Step 4-2: if alternate position spike is greater than the duration of a word, the word and hot word not overlapping part are selected from vocabulary In, the similar word of word pronunciation is as candidate word;
Step 4-3: using preparatory trained language model, word and current before predicting given beginning of the sentence word to current word After word under the conditions of word, the probability that each candidate word of current word occurs, and the score as the candidate word;
Increase the acoustic confidence information of each word, then predicts the probability that each candidate word of current word occurs, and make For the score of the candidate word;
Step 4-4: if the score of the candidate word of highest scoring is greater than given threshold value, threshold value takes 0.5, then with the candidate word Replace current word;Otherwise keep current word constant;
When the word in hot word appearance position and current recognition result have end position it is overlapping when, step 4 specifically include as Lower step:
Step 4.1: determine hot word end position place in recognition result word, calculate the end position and hot word of the word End position alternate position spike;
Step 4.2: if alternate position spike is greater than the duration of a word, the word and hot word not overlapping part are selected from vocabulary In, the similar word of word pronunciation is as candidate word;
Step 4.3: using preparatory trained language model, predicting after given sentence tail word to current word word and current Before word under the conditions of word, the probability that each candidate word of current word occurs, and the score as the candidate word;
Increase the acoustic confidence information of each word, then predicts the probability that each candidate word of current word occurs, and make For the score of the candidate word
Step 4.4: if the score of the candidate word of highest scoring is greater than given threshold value, threshold value takes 0.5, then with the candidate word Replace current word;Otherwise keep current word constant.
Specifically, in the present embodiment, the language model used in step 4 is recurrent neural network, specially LSTM/ GRU class language model.
In the present embodiment, hot word detecting and alarm is configured to corresponding with User ID to realize that user relies on, and adds hot word When upload hot word and User ID simultaneously, inquire pronunciation dictionary, obtain the pronunciation of hot word and its to sound aligned phoneme sequence;Then hot word Grammer network is added;It generating hot word and detects resource, hot word correspondence is added in hot word detecting and alarm corresponding with User ID, so that Hot word can be conveniently added in hot word detecting and alarm.
Specifically, user can transmit a triple to system, system is told to add or delete given hot word.Three Tuple is defined as follows: (ID, HotWord, OPT), wherein ID: mark user;HotWord: mark hot word;OPT: mark is dynamic Make, OPT is defined as adding or deleting.
Hot word identifying system corresponding with specific embodiment 2 is shown in Fig. 4, comprising:
Universal phonetic identifies engine 1, is configured to export the when meta position of speech recognition result and each word in audio It sets and confidence level;
Hot word detecting and alarm 2 is configured to detect whether to export ID, audio position and its score of hot word there are hot word;
Hot word modified result module 3 is configured to the voice using hot word replacement universal phonetic identification engine output The word of corresponding position in recognition result;
Language model modified result module 4 is configured to have as the word in hot word appearance position and current recognition result Word before and after hot word is modified when overlapping.
Further include hot word adding module 5, is configured to add hot word to hot word detecting and alarm.
In specific embodiment 2, as shown in figure 4, providing the mode of parallelism recognition, universal identification and hot word detection are simultaneously It does, it is desirable that have more rich computing resource, system delay is basically unchanged, and response speed is faster.
Hot word recognition methods of the invention identifies hot word using word detection scheme is waken up, hot word can user by making by oneself Justice, identify hot word after hot word result is modified, in addition, also detect hot word and it is modified on the basis of, can be further Other identification mistakes, hot word recognition methods of the invention caused by the identification of modifying factor hot word is wrong significantly reduce and accidentally correct mistake Accidentally, user can be with the hot word of fast custom oneself, and hot word Detection accuracy is high, ensure that modified high-accuracy.
In an embodiment of the present invention, a kind of hot word identification device is additionally provided comprising: including processor, memory And program;Program stores in memory, and processor calls the program of memory storage, to execute above-mentioned hot word identification side Method.
In the realization of above-mentioned hot word identification device, directly or indirectly it is electrically connected between memory and processor, with Realize the transmission or interaction of data.For example, these elements can pass through one or more of communication bus or signal between each other Line, which is realized, to be electrically connected, and can such as be connected by bus.The computer for realizing data access control method is stored in memory It executes instruction, including the software function module that at least one can be stored in memory in the form of software or firmware, processor By running the software program and module that are stored in memory, thereby executing various function application and data processing.
Memory may be, but not limited to, random access memory (Random Access Memory, referred to as: RAM), Read-only memory (Read Only Memory, referred to as: ROM), programmable read only memory (Programmable Read-Only Memory, referred to as: PROM), erasable read-only memory (Erasable Programmable Read-Only Memory, letter Claim: EPROM), electricallyerasable ROM (EEROM) (Electric Erasable Programmable Read-Only Memory, Referred to as: EEPROM) etc..Wherein, memory is for storing program, and processor executes program after receiving and executing instruction.
Processor can be a kind of IC chip, the processing capacity with signal.Above-mentioned processor can be logical With processor, including central processing unit (Central Processing Unit, referred to as: CPU), network processing unit (Network Processor, referred to as: NP) etc..It may be implemented or execute disclosed each method, step and the logic in the embodiment of the present application Block diagram.General processor can be microprocessor or the processor is also possible to any conventional processor etc..
In an embodiment of the present invention, a kind of computer readable storage medium, computer readable storage medium are additionally provided It is configured to store program, program is configured to execute above-mentioned hot word recognition methods.
Those of ordinary skill in the art will appreciate that: realize that all or part of the steps of above-mentioned each method embodiment can lead to The relevant hardware of program instruction is crossed to complete.Program above-mentioned can store in computer readable storage medium.The program exists When being executed by processor, realization includes the steps that above-mentioned each method embodiment;And computer readable storage medium above-mentioned includes: The various media that can store program code such as ROM, RAM, magnetic or disk, including some instructions are used so that one big number Each embodiment or embodiment are executed according to transmission device (can be personal computer, server or the network equipment etc.) Method described in certain parts.

Claims (12)

1. a kind of hot word recognition methods, which comprises the following steps:
Step 1: audio user being sent into universal identification engine, obtains speech recognition result, sound recognition result is expressed as W1, W2,...,Wn, wherein n is natural number, while obtaining sound recognition result WiCorresponding position and confidence level in audio, wherein 1 ≤ i≤n, i are natural number;
Step 2: audio user being sent into hot word detecting and alarm and carries out hot word retrieval, hot word W, the hot word for obtaining highest scoring are corresponding Audio position P and score S, be expressed as (W, P, S);
Step 3: judging the score S of the hot word (W, P, S) of highest scoring, if score S is greater than given threshold value, replaced with hot word W The word of audio position is corresponded in speech recognition result, executes step 4;Otherwise, terminate;
Step 4: if the word in hot word appearance position and current recognition result has overlapping, the word before and after hot word being modified.
2. a kind of hot word recognition methods according to claim 1, it is characterised in that: between step 1 and step 2, also wrap Include step 1.5: speech recognition result W if it existsi~Wj, i < j, i, j is natural number, Wi~WjAverage confidence lower than given Threshold value then extracts Wi~WjCorresponding audio fragment executes step 2.
3. a kind of hot word recognition methods according to claim 1, it is characterised in that: step 1 is synchronous with step 2 to be executed.
4. a kind of hot word recognition methods according to claim 1, it is characterised in that: step 2 specifically comprises the following steps:
Step 2-1: according to hot word vocabulary, a filler word is added, filler word is configured to connect all Acoustic Modeling lists Member, the parallel grammer of construction one identify network;
Step 2-2: the input sound bite of extraction is decoded using the viterbi algorithm of beam-search beam-search Search;
Step 2-3: backtracking obtains the hot word and the corresponding audio position of hot word of highest scoring;
Step 2-4: it by calculating average posterior probability to the corresponding speech frame of hot word, is exported as the hot word score.
5. a kind of hot word recognition methods according to claim 4, it is characterised in that: in step 2, grammer identifies in network The posterior probability score exported using universal identification acoustic model.
6. a kind of hot word recognition methods according to claim 1, it is characterised in that: in step 4, hot word appearance position and Word in current recognition result has overlapping overlapping and end position overlapping including initial position;
When the word in hot word appearance position and current recognition result have initial position it is overlapping when, step 4 specifically includes following step It is rapid:
Step 4-1: determining the word in recognition result where hot word original position, calculates the starting of the word initial position and hot word The alternate position spike of position;
Step 4-2: if alternate position spike is greater than the duration of a word, from selecting the word and hot word in vocabulary not in overlapping part, word Language pronounces similar word as candidate word;
Step 4-3: using preparatory trained language model, word after word and current word is given before beginning of the sentence word to current word Under the conditions of, predict the probability that each candidate word of current word occurs, and the score as the candidate word;
Step 4-4: if the score of the candidate word of highest scoring is greater than given threshold value, current word is replaced with the candidate word;It is no Then keep current word constant;
When the word in hot word appearance position and current recognition result have end position it is overlapping when, step 4 specifically includes following step It is rapid:
Step 4.1: determine hot word end position place in recognition result word, calculate the end position of the word and the knot of hot word The alternate position spike of beam position;
Step 4.2: if alternate position spike is greater than the duration of a word, from selecting the word and hot word in vocabulary not in overlapping part, word Language pronounces similar word as candidate word;
Step 4.3: using preparatory trained language model, giving after sentence tail word to current word word before word and current word Under the conditions of, predict the probability that each candidate word of current word occurs, and the score as the candidate word;
Step 4.4: if the score of the candidate word of highest scoring is greater than given threshold value, replacing current word with the candidate word;It is no Then keep current word constant.
7. a kind of hot word recognition methods according to claim 6, it is characterised in that: between step 4-3 and step 4-5, It is further comprising the steps of respectively: to increase the acoustic confidence information of each word, then and between step 4.3 and step 4.5 Predict the probability that each candidate word of current word occurs, and the score as the candidate word.
8. a kind of hot word recognition methods according to claim 1, it is characterised in that: hot word detecting and alarm is configured to and uses Family ID is corresponding, uploads hot word and User ID when adding hot word simultaneously, inquires pronunciation dictionary, obtains the pronunciation of hot word and its to sound sound Prime sequences;Then grammer network is added in hot word;It generates hot word and detects resource, hot word correspondence is added to heat corresponding with User ID Word detecting and alarm.
9. a kind of hot word identifying system characterized by comprising
Universal phonetic identifies engine, is configured to export the time location of speech recognition result and each word in audio and sets Reliability;
Hot word detecting and alarm is configured to detect whether to export ID, audio position and its score of hot word there are hot word;
Hot word modified result module is configured to the speech recognition knot using hot word replacement universal phonetic identification engine output The word of corresponding position in fruit;
Language model modified result module, when being configured to have overlapping as the word in hot word appearance position and current recognition result Word before and after hot word is modified.
10. a kind of hot word identifying system according to claim 9, it is characterised in that: further include hot word adding module, matched It is set to and hot word is added or updated to the hot word detecting and alarm.
11. a kind of hot word identification device, characterized in that it comprises: including processor, memory and program;
Described program stores in the memory, and the processor calls the program of memory storage, requires 1 with perform claim The hot word recognition methods.
12. a kind of computer readable storage medium, which is characterized in that the computer readable storage medium is configured to store journey Sequence, described program are configured to hot word recognition methods described in perform claim requirement 1.
CN201910706314.6A 2019-08-01 2019-08-01 Hot word recognition method, system, device and storage medium Active CN110415705B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910706314.6A CN110415705B (en) 2019-08-01 2019-08-01 Hot word recognition method, system, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910706314.6A CN110415705B (en) 2019-08-01 2019-08-01 Hot word recognition method, system, device and storage medium

Publications (2)

Publication Number Publication Date
CN110415705A true CN110415705A (en) 2019-11-05
CN110415705B CN110415705B (en) 2022-03-01

Family

ID=68365126

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910706314.6A Active CN110415705B (en) 2019-08-01 2019-08-01 Hot word recognition method, system, device and storage medium

Country Status (1)

Country Link
CN (1) CN110415705B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110689881A (en) * 2018-06-20 2020-01-14 深圳市北科瑞声科技股份有限公司 Speech recognition method, speech recognition device, computer equipment and storage medium
CN110879839A (en) * 2019-11-27 2020-03-13 北京声智科技有限公司 Hot word recognition method, device and system
CN111028830A (en) * 2019-12-26 2020-04-17 大众问问(北京)信息科技有限公司 Local hot word bank updating method, device and equipment
CN111090720A (en) * 2019-11-22 2020-05-01 北京捷通华声科技股份有限公司 Hot word adding method and device
CN111161739A (en) * 2019-12-28 2020-05-15 科大讯飞股份有限公司 Speech recognition method and related product
CN112349278A (en) * 2020-11-12 2021-02-09 苏州思必驰信息科技有限公司 Local hot word training and recognition method and device
CN112489651A (en) * 2020-11-30 2021-03-12 科大讯飞股份有限公司 Voice recognition method, electronic device and storage device
CN112599114A (en) * 2020-11-11 2021-04-02 联想(北京)有限公司 Voice recognition method and device
CN112735428A (en) * 2020-12-27 2021-04-30 科大讯飞(上海)科技有限公司 Hot word acquisition method, voice recognition method and related equipment
CN113178194A (en) * 2020-01-08 2021-07-27 上海依图信息技术有限公司 Voice recognition method and system for interactive hot word updating
WO2021232746A1 (en) * 2020-05-18 2021-11-25 科大讯飞股份有限公司 Speech recognition method, apparatus and device, and storage medium
CN113836270A (en) * 2021-09-28 2021-12-24 深圳格隆汇信息科技有限公司 Big data processing method and related product
CN114185511A (en) * 2021-11-29 2022-03-15 北京百度网讯科技有限公司 Audio data processing method and device and electronic equipment

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5559925A (en) * 1994-06-24 1996-09-24 Apple Computer, Inc. Determining the useability of input signals in a data recognition system
CN101510222A (en) * 2009-02-20 2009-08-19 北京大学 Multilayer index voice document searching method and system thereof
WO2013000136A1 (en) * 2011-06-29 2013-01-03 宇龙计算机通信科技(深圳)有限公司 Mobile terminal and method, system for inputting network hot words into mobile terminal
US20160027439A1 (en) * 2014-07-25 2016-01-28 Google Inc. Providing pre-computed hotword models
US20160104480A1 (en) * 2014-10-09 2016-04-14 Google Inc. Hotword detection on multiple devices
CN106326484A (en) * 2016-08-31 2017-01-11 北京奇艺世纪科技有限公司 Error correction method and device for search terms
CN106782607A (en) * 2012-07-03 2017-05-31 谷歌公司 Determine hot word grade of fit
US20180158454A1 (en) * 2016-12-07 2018-06-07 Google Inc. Preventing of audio attacks
US20180182390A1 (en) * 2016-12-27 2018-06-28 Google Inc. Contextual hotwords
US20180330717A1 (en) * 2017-05-11 2018-11-15 International Business Machines Corporation Speech recognition by selecting and refining hot words
CN108984529A (en) * 2018-07-16 2018-12-11 北京华宇信息技术有限公司 Real-time court's trial speech recognition automatic error correction method, storage medium and computing device
CN109271495A (en) * 2018-08-14 2019-01-25 阿里巴巴集团控股有限公司 Question and answer recognition effect detection method, device, equipment and readable storage medium storing program for executing
CN109523991A (en) * 2017-09-15 2019-03-26 阿里巴巴集团控股有限公司 Method and device, the equipment of speech recognition
CN110689881A (en) * 2018-06-20 2020-01-14 深圳市北科瑞声科技股份有限公司 Speech recognition method, speech recognition device, computer equipment and storage medium

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5559925A (en) * 1994-06-24 1996-09-24 Apple Computer, Inc. Determining the useability of input signals in a data recognition system
CN101510222A (en) * 2009-02-20 2009-08-19 北京大学 Multilayer index voice document searching method and system thereof
WO2013000136A1 (en) * 2011-06-29 2013-01-03 宇龙计算机通信科技(深圳)有限公司 Mobile terminal and method, system for inputting network hot words into mobile terminal
CN106782607A (en) * 2012-07-03 2017-05-31 谷歌公司 Determine hot word grade of fit
US20160027439A1 (en) * 2014-07-25 2016-01-28 Google Inc. Providing pre-computed hotword models
US20160104480A1 (en) * 2014-10-09 2016-04-14 Google Inc. Hotword detection on multiple devices
CN106326484A (en) * 2016-08-31 2017-01-11 北京奇艺世纪科技有限公司 Error correction method and device for search terms
US20180158454A1 (en) * 2016-12-07 2018-06-07 Google Inc. Preventing of audio attacks
US20180182390A1 (en) * 2016-12-27 2018-06-28 Google Inc. Contextual hotwords
US20180330717A1 (en) * 2017-05-11 2018-11-15 International Business Machines Corporation Speech recognition by selecting and refining hot words
CN109523991A (en) * 2017-09-15 2019-03-26 阿里巴巴集团控股有限公司 Method and device, the equipment of speech recognition
CN110689881A (en) * 2018-06-20 2020-01-14 深圳市北科瑞声科技股份有限公司 Speech recognition method, speech recognition device, computer equipment and storage medium
CN108984529A (en) * 2018-07-16 2018-12-11 北京华宇信息技术有限公司 Real-time court's trial speech recognition automatic error correction method, storage medium and computing device
CN109271495A (en) * 2018-08-14 2019-01-25 阿里巴巴集团控股有限公司 Question and answer recognition effect detection method, device, equipment and readable storage medium storing program for executing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
韩静 等: "基于词语热度的启发式中文句子压缩算法", 《计算机工程与应用》 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110689881A (en) * 2018-06-20 2020-01-14 深圳市北科瑞声科技股份有限公司 Speech recognition method, speech recognition device, computer equipment and storage medium
CN111090720A (en) * 2019-11-22 2020-05-01 北京捷通华声科技股份有限公司 Hot word adding method and device
CN111090720B (en) * 2019-11-22 2023-09-12 北京捷通华声科技股份有限公司 Hot word adding method and device
CN110879839A (en) * 2019-11-27 2020-03-13 北京声智科技有限公司 Hot word recognition method, device and system
CN111028830A (en) * 2019-12-26 2020-04-17 大众问问(北京)信息科技有限公司 Local hot word bank updating method, device and equipment
CN111028830B (en) * 2019-12-26 2022-07-15 大众问问(北京)信息科技有限公司 Local hot word bank updating method, device and equipment
CN111161739A (en) * 2019-12-28 2020-05-15 科大讯飞股份有限公司 Speech recognition method and related product
CN111161739B (en) * 2019-12-28 2023-01-17 科大讯飞股份有限公司 Speech recognition method and related product
CN113178194A (en) * 2020-01-08 2021-07-27 上海依图信息技术有限公司 Voice recognition method and system for interactive hot word updating
CN113178194B (en) * 2020-01-08 2024-03-22 上海依图信息技术有限公司 Voice recognition method and system for interactive hotword updating
WO2021232746A1 (en) * 2020-05-18 2021-11-25 科大讯飞股份有限公司 Speech recognition method, apparatus and device, and storage medium
CN112599114A (en) * 2020-11-11 2021-04-02 联想(北京)有限公司 Voice recognition method and device
CN112599114B (en) * 2020-11-11 2024-06-18 联想(北京)有限公司 Voice recognition method and device
CN112349278A (en) * 2020-11-12 2021-02-09 苏州思必驰信息科技有限公司 Local hot word training and recognition method and device
CN112489651A (en) * 2020-11-30 2021-03-12 科大讯飞股份有限公司 Voice recognition method, electronic device and storage device
CN112735428A (en) * 2020-12-27 2021-04-30 科大讯飞(上海)科技有限公司 Hot word acquisition method, voice recognition method and related equipment
CN113836270A (en) * 2021-09-28 2021-12-24 深圳格隆汇信息科技有限公司 Big data processing method and related product
CN114185511A (en) * 2021-11-29 2022-03-15 北京百度网讯科技有限公司 Audio data processing method and device and electronic equipment

Also Published As

Publication number Publication date
CN110415705B (en) 2022-03-01

Similar Documents

Publication Publication Date Title
CN110415705A (en) A kind of hot word recognition methods, system, device and storage medium
CN107679033B (en) Text sentence break position identification method and device
CN103559881B (en) Keyword recognition method that languages are unrelated and system
JP5901001B1 (en) Method and device for acoustic language model training
CN108710704B (en) Method and device for determining conversation state, electronic equipment and storage medium
CN111145733B (en) Speech recognition method, speech recognition device, computer equipment and computer readable storage medium
CN112257437B (en) Speech recognition error correction method, device, electronic equipment and storage medium
CN110689881B (en) Speech recognition method, speech recognition device, computer equipment and storage medium
WO2014187096A1 (en) Method and system for adding punctuation to voice files
CN107077843A (en) Session control and dialog control method
CN110930993A (en) Specific field language model generation method and voice data labeling system
US11526512B1 (en) Rewriting queries
US11741948B2 (en) Dilated convolutions and gating for efficient keyword spotting
CN113380238A (en) Method for processing audio signal, model training method, apparatus, device and medium
CN110751234A (en) OCR recognition error correction method, device and equipment
CN110503943B (en) Voice interaction method and voice interaction system
US10468031B2 (en) Diarization driven by meta-information identified in discussion content
JP2014206642A (en) Voice recognition device and voice recognition program
CN113204667B (en) Method and device for training audio annotation model and audio annotation
CN111128181B (en) Recitation question evaluating method, recitation question evaluating device and recitation question evaluating equipment
CN114399992A (en) Voice instruction response method, device and storage medium
Sarikaya et al. Word level confidence measurement using semantic features
CN110781072A (en) Code auditing method, device and equipment based on machine learning and storage medium
KR100449912B1 (en) Apparatus and method for detecting topic in speech recognition system
CN108511001A (en) Voice monitoring method and device, storage medium, terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant