CN103700369B - Phonetic navigation method and system - Google Patents

Phonetic navigation method and system Download PDF

Info

Publication number
CN103700369B
CN103700369B CN201310611734.9A CN201310611734A CN103700369B CN 103700369 B CN103700369 B CN 103700369B CN 201310611734 A CN201310611734 A CN 201310611734A CN 103700369 B CN103700369 B CN 103700369B
Authority
CN
China
Prior art keywords
decoding
decoding network
language model
word
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310611734.9A
Other languages
Chinese (zh)
Other versions
CN103700369A (en
Inventor
高建清
刘聪
王智国
胡国平
胡郁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN201310611734.9A priority Critical patent/CN103700369B/en
Publication of CN103700369A publication Critical patent/CN103700369A/en
Application granted granted Critical
Publication of CN103700369B publication Critical patent/CN103700369B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The invention discloses a kind of phonetic navigation method and system, belong to voice processing technology field.The method includes: receive the voice signal of user's input;Based on number of different types decoding network, described voice signal is unified decoding to identify, obtaining text word string, described number of different types decoding network includes following any two or three decoding network: extensive language model decoding network, order word decoding network, high frequency decoding network;Determine the operation that described text word string is corresponding;Perform described operation.Use this phonetic navigation method and system, it is ensured that the recognition performance to user individual voice response.

Description

Phonetic navigation method and system
Technical field
The present invention relates to voice processing technology field, particularly to a kind of phonetic navigation method and system.
Background technology
Voice Navigation technology, has a wide range of applications in self-service customer service system instantly.Being based particularly on should Answer the audio call navigation system of pattern, owing to having higher classification accuracy, become the language of current main flow Sound method of calling.
In prior art, service menu is generally designed as multilamellar knot by the audio call navigation system of answer-mode Structure, and system prompt sound is set on every layer of menu, to guide user to select.As user speech inputs " I want inquire about account detailed ", system can enter the second-level menu of " account inquiries ", and points out that " you are The same day to be inquired about is detailed, history is detailed or appointed day is detailed?", clear and definite by the way of menu displaying Supported type of service under current application environment, helps user to select.
Obviously the system prompt sound under answer-mode defines the scope that user answers to a certain extent, as Under above-mentioned " account inquiries " application scenarios, the prompt tone of system is for " you are intended to inquire about detail on the same day, history Detailed still appointed day is detailed?", grammer comprises " same day is detailed ", " history is detailed ", " specifies day Phase is detailed " three unit.To this, legacy system typically uses to be known defeated to user otherwise based on order word Entering voice to resolve, this method can obtain preferable recognition performance in the case of user coordinates, and Recognition efficiency is higher.But when user is not according to system prompt, i.e. carries out the input of order word, as to being " you are intended to inquire about detail on the same day, history is detailed or appointed day is detailed in the prompting of system?", user answers " I wants to look at the account of this morning ", then command control effect based on menu setecting is difficult to protect Card.That is, use the speech guide system of tradition answer-mode, it is difficult to ensure user individual voice is rung The recognition performance answered.
Summary of the invention
Embodiments provide a kind of phonetic navigation method and system, existing based on order word to solve Sound identification cannot ensure the problem to user individual voice response recognition effect.
The embodiment of the present invention following technical scheme of offer:
On the one hand, embodiments provide a kind of phonetic navigation method, including:
Receive the voice signal of user's input;
Based on number of different types decoding network, described voice signal is unified decoding to identify, obtain text Word string, described number of different types decoding network includes following any two or three decoding network: extensive Language model decoding network, order word decoding network, high frequency decoding network;
Determine the operation that described text word string is corresponding;
Perform described operation.
Preferably, described method also includes: building extensive language model decoding network, building process includes:
Corpus is utilized to build navigation field language model;
Collect the dialogic voice under particular navigation scene, and utilize described navigation field language model to described right Language sound is decoded, and obtains decoding word string;
Described decoding word string is utilized to train particular navigation scene language model;
Described navigation field language model and described particular navigation scene language model are carried out interpolation, obtains big Scale language model decoding network.
Preferably, described method also includes: building order word decoding network, building process includes:
Collecting the menu option under particular navigation scene, described menu selection item includes: pad name and not Name;
Described menu option in parallel forms order word decoding network;
Utilize the gram language model average probability in described extensive language model that described order word solution is set The weight of each word in code network.
Preferably, described method also includes: building high frequency decoding network, building process includes:
Collect the high frequency language material under particular navigation scene;
Described high frequency language material in parallel forms high frequency decoding network;
Utilize the gram language model average probability in described extensive language model that the decoding of described high frequency is set The weight of each word in network.
Preferably, described unification described voice signal based on number of different types decoding network decodes knowledge Not, obtain text word string to include:
It is decoded identifying to described voice signal based on extensive language model decoding network, obtains decoding knot First score of fruit;
It is decoded identifying to described voice signal based on order word decoding network, obtains the second of decoded result Score;
It is decoded identifying to described voice signal based on high frequency decoding network, obtains the 3rd of decoded result and obtain Point;
Select the first score, the second score decoded result corresponding with the maximum score in the 3rd score as institute State text word string.
Preferably, described method also includes:
During being decoded identifying to described voice signal based on extensive language model decoding network, If a paths occurring preset semantic associative key or expansion word, then described decoding paths is carried out Preset weight gain;
Using the score after gain as the score of described decoding paths.
Preferably, the described operation determining that described text word string is corresponding includes:
If the decoded result that described text word string is order word decoding network, then according to described decoded result pair The semanteme answered determines the operation that described text word string is corresponding;
Otherwise, described decoded result and described lists of keywords are carried out Keywords matching, obtain matching result;
The operation that described text word string is corresponding is determined according to the semanteme that described matching result is corresponding.
Preferably, described method also includes:
Business function is organized into multi-menu structure, and every layer of menu is set up respectively key word row Table;
Described described decoded result and lists of keywords are carried out Keywords matching, obtain matching result and include:
Determine the menu hierarchy that current business is corresponding;
Obtain the lists of keywords of described menu hierarchy and layers below thereof;
The lists of keywords of described decoded result Yu acquisition is carried out successively Keywords matching, obtains coupling knot Really.
On the other hand, embodiments provide a kind of speech guide system, including:
Receiver module, for receiving the voice signal of user's input;
Decoder module, for unifying to decode to described voice signal based on number of different types decoding network Identifying, obtain text word string, described number of different types decoding network includes that following any two or three solves Code network: extensive language model decoding network, order word decoding network, high frequency decoding network;
Determine module, for determining the operation that described text word string is corresponding;
Perform module, be used for performing described operation.
Preferably, described system also includes following any two or three module:
First builds module, is used for building extensive language model decoding network;
Second builds module, is used for building order word decoding network;
3rd builds module, is used for building high frequency decoding network.
Preferably, described first structure module includes:
First language model unit, is used for utilizing corpus to build navigation field language model;
Decoding unit, for collecting the dialogic voice under particular navigation scene, and utilizes described navigation field language Described dialogic voice is decoded by speech model, obtains decoding word string;
Second language model unit, is used for utilizing described decoding word string to train particular navigation scene language model;
Interpolating unit, for entering described navigation field language model and described particular navigation scene language model Row interpolation, obtains extensive language model decoding network.
Preferably, described second structure module includes:
Menu option unit, for collecting the menu option under particular navigation scene, described menu option includes: Pad name and another name thereof;
First parallel units, forms order word decoding network for described menu option in parallel;
First weighted units, for utilizing the gram language model average probability in described extensive language model The weight of each word in described order word decoding network is set.
Preferably, described 3rd structure module includes:
High frequency language material unit, for collecting the high frequency language material under particular navigation scene;
Second parallel units, forms high frequency decoding network for described high frequency language material in parallel;
Second weighted units, for utilizing the gram language model average probability in described extensive language model The weight of each word in described high frequency decoding network is set.
Preferably, described decoder module includes:
First decoding unit, for solving described voice signal based on extensive language model decoding network Code identifies, obtains the first score of decoded result;
Second decoding unit, for being decoded identifying to described voice signal based on order word decoding network, Obtain the second score of decoded result;
3rd decoding unit, for being decoded identifying to described voice signal based on high frequency decoding network, The 3rd score to decoded result;
Select unit, for selecting the first score, the second score corresponding with the maximum score in the 3rd score Decoded result is as described text word string.
Preferably, described first decoding unit, be additionally operable to based on extensive language model decoding network to institute During predicate tone signal is decoded identifying, if a paths occurring preset semantic associative key Or expansion word, then carry out described decoding paths presetting weight gain, and using the score after gain as institute State the score of decoding paths.
Preferably, described determine that module includes:
Judging unit, for judging that whether described text word string is the decoded result of order word decoding network;
First determines unit, for judging that described text word string is order word decoding network at described judging unit Decoded result after, determine, according to the semanteme that described decoded result is corresponding, the operation that described text word string is corresponding;
At described judging unit, Keywords matching unit, for judging that described text word string is not the decoding of order word After the decoded result of network, described decoded result and lists of keywords are carried out Keywords matching, obtain coupling Result;
Second determines unit, for determining that described text word string is corresponding according to the semanteme that described matching result is corresponding Operation.
Preferably, described system also includes:
Lists of keywords builds module, for business function is organized into multi-menu structure, and to every layer of dish A lists of keywords set up respectively by list;
Described Keywords matching unit includes:
Menu hierarchy determines unit, for determining the menu hierarchy that current business is corresponding;
Lists of keywords acquiring unit, for obtaining the key word row of described menu hierarchy and layers below thereof Table;
Matching unit, for carrying out successively key word by the lists of keywords of described decoded result Yu acquisition Join, obtain matching result.
The phonetic navigation method of embodiment of the present invention offer and system, combine number of different types decoding network Advantage, by the voice signal that user is inputted, use and unite based on number of different types decoding network One decoding identifies, obtains text word string and the operation of correspondence such that it is able to respond the personalized speech of user It is identified.On the premise of ensureing recognition result, improve the motility of user's response.Utilize the present invention The audio recognition method of embodiment offer and system, can be substantially improved Consumer's Experience.
Accompanying drawing explanation
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to enforcement In example, the required accompanying drawing used is briefly described, it should be apparent that, the accompanying drawing in describing below is only Some embodiments described in the present invention, for those of ordinary skill in the art, it is also possible to according to these Accompanying drawing obtains other accompanying drawing.
Fig. 1 is Voice Navigation process schematic of the prior art;
Fig. 2 is the phonetic navigation method flow chart that the embodiment of the present invention provides;
Fig. 3 is the building process schematic diagram of the extensive language model decoding network that the embodiment of the present invention provides;
Fig. 4 is the building process schematic diagram of the order word decoding network that the embodiment of the present invention provides;
Fig. 5 is the building process schematic diagram of the high frequency decoding network that the embodiment of the present invention provides;
Fig. 6 is the structural representation of the speech guide system that the embodiment of the present invention provides.
Detailed description of the invention
In order to make those skilled in the art be more fully understood that the scheme of the embodiment of the present invention, below in conjunction with the accompanying drawings and The embodiment of the present invention is described in further detail by embodiment.
First below Voice Navigation process of the prior art is simply introduced.
As it is shown in figure 1, be Voice Navigation process schematic of the prior art.
In prior art, voice signal user inputted based on order word decoding network is generally used to solve Code identifies, specifically includes procedure below:
Step 101: receive the voice signal of user's input;
Step 102: be decoded identifying to voice signal based on order word decoding network, obtain text word string;
Step 103: determine the operation that text word string is corresponding;
Step 104: perform operation.
Generally, order word decoding network is built-up by collecting key word common under particular navigation scene 's.Common key word can be pad name and another name etc. thereof, is used for forming menu option, by obtain Menu option is in parallel thus obtains order word decoding network.Use the language that user is inputted by order word decoding network After tone signal is decoded identifying, obtaining text word string, text word is ganged up the most corresponding with user view, should Text word string can be as the decoded result of order word decoding network.
After obtaining decoded result, need judge decoding identify confidence level, namely judge decoding identify can By property, generally use LRT(likelihood ratio testing, likelihood ratio test) add up.Assume H0Generation Table identification is correct, H1Represent identify mistake, generally decoding optimal path score is designated as p (X | H0), and make With the summation of other all path score approximate p (X | H1).Generally, Systematic selection LLR(Log Likelihood Ratio, log likelihood ratio), as the confidence score of recognition result: LLR=logp(X|H0)-logp(X|H1).By the confidence score of recognition result whether more than presetting threshold Value, judges that recognition result is the most reliable.If greater than predetermined threshold value, then explanation recognition result is reliable;No Then, illustrate that recognition result is unreliable.
If recognition result is reliable, then obtains the semantic information that recognition result is corresponding, may thereby determine that text The user view that word string is corresponding, operation corresponding to this user view is often corresponding with menu option.Due to In the decoding of order word, each order word is respectively to the menu option that should determine that, so being easy to according to order word Obtain the menu function of correspondence.If recognition result is unreliable, then prompting user re-enters voice signal, Again it is decoded voice signal identifying.
Therefore, Voice Navigation of the prior art, it is only used for the order word navigation that menu option is corresponding, right Voice signal outside menu option, owing to the recognition result obtained is unreliable, by circulation prompting user's weight Multiple input speech signal, it is impossible to obtain effective recognition result.
To this end, the embodiment of the present invention proposes a kind of phonetic navigation method, it is possible to effectively identify user not according to The voice signal of menu prompt input.
As in figure 2 it is shown, be the phonetic navigation method flow chart of embodiment of the present invention offer, comprise the following steps:
Step 201: receive the voice signal of user's input;
Step 202: based on number of different types decoding network, voice signal is unified decoding and identify, To text word string;
Step 203: determine the operation that text word string is corresponding;
Step 204: perform operation.
The embodiment of the present invention provide phonetic navigation method, utilize the decoding network of number of different types, to The voice signal of family input carries out unifying decoding and identifies, wherein, the decoding network of number of different types can be In extensive language model decoding network, order word decoding network and high frequency decoding network any two kinds or three Kind.
It is decoded identifying different to voice signal from prior art is based purely on order word decoding network It is that voice signal is unified decoding and known by embodiment of the present invention decoding network based on number of different types Not, different types of decoding network is searched for optimal path, thus ensures the reliability of recognition result, and And be capable of identify that be based purely on order word corresponding to the menu option of order word decoding network institute None-identified it Outer voice.Specifically, after the voice signal receiving user's input, the acoustics extracting voice signal is special Levy sequence, obtain the decoding paths set of the unified decoding that acoustic features sequence pair is answered, and therefrom search for optimum Path.
Building process to three kinds of different types of decoding networks involved by the embodiment of the present invention separately below Briefly introduce.
As it is shown on figure 3, the building process of the extensive language model decoding network provided for the embodiment of the present invention Schematic diagram.The building process of extensive language model decoding network is as follows:
Step 301: utilize corpus to build navigation field language model;
Step 302: collect the dialogic voice under particular navigation scene, and utilize navigation field language model pair Dialogic voice is decoded, and obtains decoding word string;
Step 303: utilize decoding word string training particular navigation scene language model;
Step 304: navigation field language model and particular navigation scene language model are carried out interpolation, obtains Extensive language model decoding network.
In embodiments of the present invention, navigation field language model is the relevant training language by collecting navigation field Material is trained obtaining, and wherein, training method can be this area routine training method, the most superfluous at this State;The language model of navigation field is usually order language model, such as, can be ternary (Tri-gram) Language model or binary (bi-gram) language model etc., the embodiment of the present invention is not defined.
Collect the dialogic voice under certain specific navigation scenarios, utilize the navigation field language that step 301 builds Dialogic voice is decoded by speech model, obtains decoding word string, then is trained obtaining this spy to decoding word string Determine navigation scenarios language model.Navigation field language model and particular navigation scene language model are inserted Value, i.e. can get extensive language model decoding network.Interpolation is technological means commonly used in the art, the most only Do briefly introduction.
Assuming that navigation field language model is first language model, its model unit (N-gram) sum is N1, particular navigation scene language model is second language model, and its model unit sum is N2, then interpolation N-gram sum in the language model obtained is N1+N2-(first language model and second language model In identical N-gram number), the probability of the most each N-gram is first language model and second language model The probability weight of corresponding units and.Such as, first language model exists certain N-gram for " online- Open-minded ", what this word of i.e. " surfing the Net " connect below is " open-minded ", and its probability is P1, second language model In there is also " online-open-minded " this N-gram, its probability is P2, then " surf the Net-open in final mask Logical " probability of this N-gram is k*P1+ (1-k) * P2, wherein, k is interpolation weights.
As shown in Figure 4, it is the building process schematic diagram of order word decoding network that provides of the embodiment of the present invention.
The building process of order word decoding network is as follows:
Step 401: collecting the menu option under particular navigation scene, wherein, menu option includes menu identity Claim and another name;
Step 402: menu option in parallel forms order word decoding network;
Step 403: utilize the gram language model average probability setting command word solution in extensive language model The weight of each word in code network.
In embodiments of the present invention, the menu option under certain particular navigation scene, such as pad name are collected And another name, the menu option collected is carried out parallel connection, order word decoding network can be obtained.Due to The score of the order word decoding network decoded result arrived and extensive language model decoding network decoded result Score cannot be carried out comparing, and in order to make the score of decoded result have comparability, needs the order word obtained Decoding network is optimized.Specifically can use unitary (uni-gram) language in extensive language model The weight of each word in model average probability setting command word decoding network, will be weighted the score after processing, Score as order word decoding network decoded result.Such as, in order word decoding network certain word for " to open Logical surfing Internet with cell phone ", the acoustic model scores of its correspondence represents with ScoreA, in extensive language model Gram language model average probability score represents with ScoreB, then the decoded result of this word must be divided into: ScoreA (opens Logical)+ScoreB+ScoreA (mobile phone)+ScoreB+ScoreA (online)+ScoreB.
As it is shown in figure 5, be the building process schematic diagram of the high frequency decoding network that the embodiment of the present invention provides.
The building process of high frequency decoding network is as follows:
Step 501: collect the high frequency language material under particular navigation scene;
Step 502: high frequency language material in parallel forms high frequency decoding network;
Step 503: utilize the gram language model average probability in extensive language model that high frequency decoding is set The weight of each word in network.
In embodiments of the present invention, the building process of high frequency decoding network and the structure of mentioned order word decoding network Process of building is similar to, and the different high frequency language materials under the particular navigation scene being to collect is different from menu Title, such as, can be complete sentence.After high frequency language material in parallel forms high frequency decoding network, to high frequency Decoding network is optimized the process of process, with reference to the optimization process of foregoing command word decoding network, at this not Repeat again.
In embodiments of the present invention, it is primarily based on number of different types decoding network voice signal is unified Decoding identifies, then searches for optimal path in different types of decoding network.By calculating dissimilar solution The height of code network decoding result score, carries out the selection of optimal path, by the decoding network of highest scoring As optimal path, decoded result corresponding to this decoding network is as text word string.Such as, based on a large scale Voice signal is decoded identifying by language model decoding network, obtains the first score of decoded result;Based on Voice signal is decoded identifying by order word decoding network, obtains the second score of decoded result;Based on height Frequently voice signal is decoded identifying by decoding network, obtains the 3rd score of decoded result;First is selected to obtain Point, the second score decoded result corresponding with the maximum score in the 3rd score identify knot as unified decoding Really, i.e. text word string described in preceding step 202.
During being decoded identifying to voice signal based on extensive language model decoding network, if One paths occurs preset semantic associative key or expansion word, in order to improve obtaining of efficient decoding path Point, it is ensured that it is won in decoding competition, can carry out this decoding paths further presetting weight gain, Using the score after gain as the score of this decoding paths.Specifically, gain method is as follows:
(1) determine default semantic associative key or the expansion word of current path, and obtain it at language Weighted score p (x) in model;
(2) carry out weighted score p (x) presetting weight gain, such as p ' (x)=p (x) * a, wherein, a > 1, a For default weight gain;
(3) using the score after gain as the score of decoding paths.
In the embodiment of the present invention, identifying voice signal being unified decoding, after obtaining text word string, needing The operation that text word string to be determined is corresponding, can perform corresponding operating.
Just how to determine that the operation that text word string is corresponding is simply introduced below:
First, it is determined that whether text word string is the decoded result of order word decoding network;If it is, according to The semanteme that decoded result is corresponding determines the operation that text word string is corresponding;Otherwise, then by decoded result and key word List carries out Keywords matching, obtains matching result;Text is may determine that according to the semanteme that matching result is corresponding The operation that word string is corresponding.
When the decoded result that text word string is order word decoding network, it is meant that user is defeated according to menu prompt Enter voice signal, after being decoded voice signal identifying, it is possible to obtain the decoding corresponding with menu option As a result, now, i.e. can determine that, according to the semanteme that decoded result is corresponding, the operation that text word string is corresponding.
When the decoded result that text word string is not order word decoding network, namely text word string is extensive language During the decoded result of the speech decoded result of model decoding network or high frequency decoding network, it is meant that user does not presses According to menu prompt input speech signal, after being decoded voice signal identifying, it is impossible to obtain selecting with menu The decoded result that item is corresponding, at this time, it may be necessary to utilize lists of keywords that decoded result is carried out Keywords matching.
In embodiments of the present invention, lists of keywords can build in advance, according to specific session operational scenarios Corresponding lists of keywords can be obtained.Menu option corresponding to text word string to be determined typically requires respectively Determine its operational order and/or parameter.Due to decoded result correspondence text word string, by text word string with crucial After word list carries out Keywords matching, may be matched result, this matching result generally comprises operational order And/or parameter, may determine that, according to the semanteme that matching result is corresponding, the operation that text word string is corresponding.
In actual applications, the key word of corresponding different business function can be arranged on a lists of keywords In, when mating, directly invoke this lists of keywords and carry out Keywords matching.And it is typically due to business Application has hierarchical relationship, therefore, in order to improve matching efficiency further, can be organized into by business function Multi-menu structure, the lists of keywords of a local set up by every layer of menu.Correspondingly, key word is being carried out During coupling, can first determine the menu hierarchy that current business is corresponding, obtain this menu hierarchy and following The lists of keywords of layer, then by text word string and lists of keywords are carried out successively Keywords matching, To matching result.Described matching result can comprise operational order and/or parameter, thus obtain text word The operation that string is corresponding.
Below as a example by " surfing Internet with cell phone " business, keyword match technique in the embodiment of the present invention is carried out in detail Describe in detail bright:
Such as " surfing Internet with cell phone " business, corresponding operation is " open-minded ", " cancellation ", " inquiry entry-into-force time " Deng, corresponding parameter is " ten yuan of set meals ", " 20 yuan of set meals " etc..
Keywords matching process is as follows:
(1) lists of keywords of its correspondence is obtained according to session operational scenarios.
Such as, under selecting " surfing Internet with cell phone " business scenario, its prompt tone is for " could you tell me and be intended to open also It is to cancel set meal?", its lists of keywords is " open-minded ", " cancellation ", " entry-into-force time " etc.;Selecting Under " set meal type " business scenario, its prompt tone be " good, open online set meal, we have five yuan, Ten yuan, 20 yuan of multiple set meals, could you tell me any?", its lists of keywords be " five yuan ", " ten Unit ", " 20 yuan " etc..
(2) text word string and lists of keywords are mated, obtain matching result.
Such as, continuous text word string for " I look on the bright side of things logical five yuan online set meals ", its lists of keywords is for " to open Logical ", " cancellation ", " entry-into-force time " etc., first match " open-minded ", i.e. the session operational scenarios of correspondence is " open online set meal ".
The phonetic navigation method that the embodiment of the present invention provides, believes voice based on number of different types decoding network Number carry out unifying decoding to identify, the problem preferably solving user individual voice response identification.If used Family carries out response according to suggestion voice, and order word decoding network can quickly be identified;If user is not Carry out response, extensive language model decoding network and/or high frequency decoding network according to suggestion voice, coordinate Keyword match technique, it is possible to achieve the identification correct to voice signal.Use this phonetic navigation method, energy Enough recognition performances ensured user individual voice response.
Correspondingly, the embodiment of the present invention also provides for a kind of speech guide system, as shown in Figure 6, is the present invention The structural representation of the speech guide system that embodiment provides.
In the embodiment of the present invention, speech guide system, including:
Receiver module 601, for receiving the voice signal of user's input;
Decoder module 602, for unifying described voice signal based on number of different types decoding network Decoding identifies, obtains text word string;
Determine module 603, for determining the operation that text word string is corresponding;
Perform module 604, be used for performing operation.
Above-mentioned number of different types decoding network can include following any two or three decoding network: advises greatly Mould language model decoding network, order word decoding network, high frequency decoding network.
Correspondingly, the speech guide system of the embodiment of the present invention can also include building above-mentioned each decoding network Module, is respectively as follows: the first structure module, the second structure module and the 3rd structure module.Wherein:
First builds module, is used for building extensive language model decoding network, and this first structure module is permissible Including: first language model unit, it is used for utilizing corpus to build navigation field language model;Decoding is single Unit, for collecting the dialogic voice under particular navigation scene, and utilizes navigation field language model to language Sound is decoded, and obtains decoding word string;Second language model unit, is used for utilizing decoding word string training specific Navigation scenarios language model;Interpolating unit, for navigation field language model and particular navigation scene language Model carries out interpolation, obtains extensive language model decoding network.
Second builds module, is used for building order word decoding network, and this second structure module may include that dish Uniterming unit, for collecting the menu option under particular navigation scene, described menu option includes: menu Title and another name thereof;First parallel units, forms order word decoding network for described menu option in parallel; First weighted units, for utilizing the gram language model average probability in described extensive language model to arrange The weight of each word in described order word decoding network.
3rd builds module, is used for building high frequency decoding network, and the 3rd builds module may include that high frequency Language material unit, for collecting the high frequency language material under particular navigation scene;Second parallel units, for institute in parallel State high frequency language material and form high frequency decoding network;Second weighted units, for utilizing in extensive language model One gram language model average probability arranges the weight of each word in high frequency decoding network.
In actual applications, decoder module 602 may include that
First decoding unit, for being decoded knowing to voice signal based on extensive language model decoding network Not, the first score of decoded result is obtained;
Second decoding unit, for being decoded identifying to voice signal based on order word decoding network, obtains Second score of decoded result;
3rd decoding unit, for being decoded identifying to voice signal based on high frequency decoding network, is solved 3rd score of code result;
Select unit, for selecting the first score, the second score corresponding with the maximum score in the 3rd score Decoded result is as text word string.
In embodiments of the present invention, the first decoding unit, it is additionally operable to decoding net based on extensive language model During voice signal is decoded identifying by network, if a paths occurring preset semantic related keyword Word or expansion word, then carry out decoding paths presetting weight gain, and using the score after gain as decoding The score in path.
It should be noted that in actual applications, can be according to application needs, decoder module 602 only comprises In above-mentioned first decoding unit, the second decoding unit and the 3rd decoding unit any two kinds.
In embodiments of the present invention, determine that module 603 may include that
Judging unit, for judging that whether text word string is the decoded result of order word decoding network;
First determines unit, for judging, at judging unit, the decoding knot that text word string is order word decoding network After Guo, determine, according to the semanteme that decoded result is corresponding, the operation that text word string is corresponding;
At described judging unit, Keywords matching unit, for judging that text word string is not order word decoding network Decoded result after, decoded result and lists of keywords are carried out Keywords matching, obtain matching result;
Second determines unit, for determining, according to the semanteme that matching result is corresponding, the operation that text word string is corresponding.
It should be noted that in actual applications, the key word of corresponding different business function can be arranged on In one lists of keywords, Keywords matching unit, when mating, directly invokes this lists of keywords and enters Row Keywords matching.It addition, in order to improve matching efficiency further, it is also possible to set up corresponding with each layer menu Multilamellar lists of keywords.Correspondingly, Keywords matching unit uses the mode successively mated to carry out key word Matching treatment, it is thus achieved that matching result.
Such as, in another embodiment of present system, described system can also include:
Lists of keywords builds module, for business function is organized into multi-menu structure, and to every layer of dish A lists of keywords set up respectively by list.
Correspondingly, in this embodiment, described Keywords matching unit may include that
Menu hierarchy determines unit, for determining the menu hierarchy that current business is corresponding;
Lists of keywords acquiring unit, for obtaining the key word row of described menu hierarchy and layers below thereof Table;
Matching unit, for carrying out successively key word by the lists of keywords of described decoded result Yu acquisition Join, obtain matching result.
Arranging above-mentioned lists of keywords structure module can facilitate system developer according to different the answering of user Carry out function setting flexibly by demand, improve development efficiency.
The phonetic navigation method that above-described embodiment provides and system belong to same inventive concept, and it implemented Journey refers to embodiment of the method, repeats no more here.
The speech guide system that the embodiment of the present invention provides, believes voice based on number of different types decoding network Number carry out unifying decoding to identify, the problem preferably solving user individual voice response identification.If used Family carries out response according to menu option suggestion voice, and order word decoding network can quickly be identified;As Really user does not carries out response, extensive language model decoding network and high frequency according to menu option suggestion voice Decoding network, coordinates keyword match technique, it is possible to achieve voice signal is carried out correct identification.Use This speech guide system, it is possible to ensure the recognition performance to user individual voice response.
Each embodiment in this specification all uses the mode gone forward one by one to describe, identical between each embodiment Similar part sees mutually, what each embodiment stressed is different from other embodiments it Place.For system embodiment, owing to it is substantially similar to embodiment of the method, so describing Fairly simple, relevant part sees the part of embodiment of the method and illustrates.System described above is implemented Example is only that schematically the wherein said unit illustrated as separating component can be or may not be Physically separate, the parts shown as unit can be or may not be physical location, the most permissible It is positioned at a place, or can also be distributed on multiple NE.Can select according to the actual needs Some or all of module therein realizes the purpose of the present embodiment scheme.Those of ordinary skill in the art exist In the case of not paying creative work, i.e. it is appreciated that and implements.
The foregoing is only presently preferred embodiments of the present invention, not in order to limit the present invention, all in the present invention Spirit and principle within, any modification, equivalent substitution and improvement etc. made, should be included in this Within bright protection domain.

Claims (15)

1. a phonetic navigation method, it is characterised in that including:
Receive the voice signal of user's input;
Based on number of different types decoding network, described voice signal is unified decoding to identify, obtain text Word string, described number of different types decoding network includes following any two or three decoding network: extensive Language model decoding network, order word decoding network, high frequency decoding network;
Determine the operation that described text word string is corresponding;
Perform described operation;
Wherein, the described operation determining that described text word string is corresponding includes:
If the decoded result that described text word string is order word decoding network, then according to described decoded result pair The semanteme answered determines the operation that described text word string is corresponding;
Otherwise, described decoded result and lists of keywords are carried out Keywords matching, obtain matching result;
The operation that described text word string is corresponding is determined according to the semanteme that described matching result is corresponding.
Method the most according to claim 1, it is characterised in that described method also includes: build big Scale language model decoding network, building process includes:
Corpus is utilized to build navigation field language model;
Collect the dialogic voice under particular navigation scene, and utilize described navigation field language model to described right Language sound is decoded, and obtains decoding word string;
Described decoding word string is utilized to train particular navigation scene language model;
Described navigation field language model and described particular navigation scene language model are carried out interpolation, obtains big Scale language model decoding network.
Method the most according to claim 1, it is characterised in that described method also includes: build life Making word decoding network, building process includes:
Collecting the menu option under particular navigation scene, described menu option includes: pad name and another name thereof;
Described menu option in parallel forms order word decoding network;
Utilize the gram language model average probability in described extensive language model that described order word solution is set The weight of each word in code network.
Method the most according to claim 1, it is characterised in that described method also includes: build height Frequently decoding network, building process includes:
Collect the high frequency language material under particular navigation scene;
Described high frequency language material in parallel forms high frequency decoding network;
Utilize the gram language model average probability in described extensive language model that the decoding of described high frequency is set The weight of each word in network.
5. according to the method described in any one of Claims 1-4, it is characterised in that described based on multiple Described voice signal is unified decoding and is identified by dissimilar decoding network, obtains text word string and includes:
It is decoded identifying to described voice signal based on extensive language model decoding network, obtains decoding knot First score of fruit;
It is decoded identifying to described voice signal based on order word decoding network, obtains the second of decoded result Score;
It is decoded identifying to described voice signal based on high frequency decoding network, obtains the 3rd of decoded result and obtain Point;
Select the first score, the second score decoded result corresponding with the maximum score in the 3rd score as institute State text word string.
Method the most according to claim 5, it is characterised in that described method also includes:
During being decoded identifying to described voice signal based on extensive language model decoding network, If a paths occurring preset semantic associative key or expansion word, then described decoding paths is carried out Preset weight gain;
Using the score after gain as the score of described decoding paths.
7. according to the method described in any one of Claims 1-4, it is characterised in that described method is also wrapped Include:
Business function is organized into multi-menu structure, and every layer of menu is set up respectively key word row Table;
Described described decoded result and lists of keywords are carried out Keywords matching, obtain matching result and include:
Determine the menu hierarchy that current business is corresponding;
Obtain the lists of keywords of described menu hierarchy and layers below thereof;
The lists of keywords of described decoded result Yu acquisition is carried out successively Keywords matching, obtains coupling knot Really.
8. a speech guide system, it is characterised in that including:
Receiver module, for receiving the voice signal of user's input;
Decoder module, for unifying to decode to described voice signal based on number of different types decoding network Identifying, obtain text word string, described number of different types decoding network includes that following any two or three solves Code network: extensive language model decoding network, order word decoding network, high frequency decoding network;
Determine module, for determining the operation that described text word string is corresponding;
Perform module, be used for performing described operation;
Wherein, described determine that module includes:
Judging unit, for judging that whether described text word string is the decoded result of order word decoding network;
First determines unit, for judging that described text word string is order word decoding network at described judging unit Decoded result after, determine, according to the semanteme that described decoded result is corresponding, the operation that described text word string is corresponding;
At described judging unit, Keywords matching unit, for judging that described text word string is not the decoding of order word After the decoded result of network, described decoded result and lists of keywords are carried out Keywords matching, obtain coupling Result;
Second determines unit, for determining that described text word string is corresponding according to the semanteme that described matching result is corresponding Operation.
System the most according to claim 8, it is characterised in that described system also include following arbitrarily Two or three module:
First builds module, is used for building extensive language model decoding network;
Second builds module, is used for building order word decoding network;
3rd builds module, is used for building high frequency decoding network.
System the most according to claim 9, it is characterised in that described first builds module includes:
First language model unit, is used for utilizing corpus to build navigation field language model;
Decoding unit, for collecting the dialogic voice under particular navigation scene, and utilizes described navigation field language Described dialogic voice is decoded by speech model, obtains decoding word string;
Second language model unit, is used for utilizing described decoding word string to train particular navigation scene language model;
Interpolating unit, for entering described navigation field language model and described particular navigation scene language model Row interpolation, obtains extensive language model decoding network.
11. systems according to claim 9, it is characterised in that described second builds module includes:
Menu option unit, for collecting the menu option under particular navigation scene, described menu option includes: Pad name and another name thereof;
First parallel units, forms order word decoding network for described menu option in parallel;
First weighted units, for utilizing the gram language model average probability in described extensive language model The weight of each word in described order word decoding network is set.
12. systems according to claim 9, it is characterised in that the described 3rd builds module includes:
High frequency language material unit, for collecting the high frequency language material under particular navigation scene;
Second parallel units, forms high frequency decoding network for described high frequency language material in parallel;
Second weighted units, for utilizing the gram language model average probability in described extensive language model The weight of each word in described high frequency decoding network is set.
13. according to Claim 8 to the system described in 12 any one, it is characterised in that described decoding mould Block includes:
First decoding unit, for solving described voice signal based on extensive language model decoding network Code identifies, obtains the first score of decoded result;
Second decoding unit, for being decoded identifying to described voice signal based on order word decoding network, Obtain the second score of decoded result;
3rd decoding unit, for being decoded identifying to described voice signal based on high frequency decoding network, The 3rd score to decoded result;
Select unit, for selecting the first score, the second score corresponding with the maximum score in the 3rd score Decoded result is as described text word string.
14. systems according to claim 13, it is characterised in that
Described first decoding unit, is additionally operable to believing described voice based on extensive language model decoding network During number being decoded identifying, if a paths occurring preset semantic associative key or extension Word, then carry out described decoding paths presetting weight gain, and using the score after gain as described decoding road The score in footpath.
15. according to Claim 8 to the system described in 12 any one, it is characterised in that described system is also Including:
Lists of keywords builds module, for business function is organized into multi-menu structure, and to every layer of dish A lists of keywords set up respectively by list;
Described Keywords matching unit includes:
Menu hierarchy determines unit, for determining the menu hierarchy that current business is corresponding;
Lists of keywords acquiring unit, for obtaining the key word row of described menu hierarchy and layers below thereof Table;
Matching unit, for carrying out successively key word by the lists of keywords of described decoded result Yu acquisition Join, obtain matching result.
CN201310611734.9A 2013-11-26 2013-11-26 Phonetic navigation method and system Active CN103700369B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310611734.9A CN103700369B (en) 2013-11-26 2013-11-26 Phonetic navigation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310611734.9A CN103700369B (en) 2013-11-26 2013-11-26 Phonetic navigation method and system

Publications (2)

Publication Number Publication Date
CN103700369A CN103700369A (en) 2014-04-02
CN103700369B true CN103700369B (en) 2016-08-31

Family

ID=50361875

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310611734.9A Active CN103700369B (en) 2013-11-26 2013-11-26 Phonetic navigation method and system

Country Status (1)

Country Link
CN (1) CN103700369B (en)

Families Citing this family (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104064184B (en) * 2014-06-24 2017-03-08 科大讯飞股份有限公司 The construction method of isomery decoding network and system, audio recognition method and system
CN105321518B (en) * 2014-08-05 2018-12-04 中国科学院声学研究所 A kind of rejection method for identifying of low-resource Embedded Speech Recognition System
CN104240700B (en) * 2014-08-26 2018-09-07 智歌科技(北京)有限公司 A kind of global voice interactive method and system towards vehicle-mounted terminal equipment
CN104901807B (en) * 2015-04-07 2019-03-26 河南城建学院 A kind of vocal print cryptographic methods can be used for low side chip
CN106981287A (en) * 2016-01-14 2017-07-25 芋头科技(杭州)有限公司 A kind of method and system for improving Application on Voiceprint Recognition speed
CN106971712A (en) * 2016-01-14 2017-07-21 芋头科技(杭州)有限公司 A kind of adaptive rapid voiceprint recognition methods and system
CN106971735B (en) * 2016-01-14 2019-12-03 芋头科技(杭州)有限公司 A kind of method and system regularly updating the Application on Voiceprint Recognition of training sentence in caching
CN106971726A (en) * 2016-01-14 2017-07-21 芋头科技(杭州)有限公司 A kind of adaptive method for recognizing sound-groove and system based on code book
CN106205611B (en) * 2016-06-29 2020-03-27 北京儒博科技有限公司 Man-machine interaction method and system based on multi-mode historical response result
CN106710591A (en) * 2016-12-13 2017-05-24 云南电网有限责任公司电力科学研究院 Voice customer service system for power terminal
CN107680588B (en) * 2017-05-10 2020-10-20 平安科技(深圳)有限公司 Intelligent voice navigation method, device and storage medium
CN107316635B (en) * 2017-05-19 2020-09-11 科大讯飞股份有限公司 Voice recognition method and device, storage medium and electronic equipment
CN107174418A (en) * 2017-06-28 2017-09-19 歌尔股份有限公司 A kind of intelligent wheel chair and its control method
CN110689881B (en) * 2018-06-20 2022-07-12 深圳市北科瑞声科技股份有限公司 Speech recognition method, speech recognition device, computer equipment and storage medium
CN108510990A (en) * 2018-07-04 2018-09-07 百度在线网络技术(北京)有限公司 Audio recognition method, device, user equipment and storage medium
CN109325097B (en) * 2018-07-13 2022-05-27 海信集团有限公司 Voice guide method and device, electronic equipment and storage medium
CN109410927B (en) * 2018-11-29 2020-04-03 北京蓦然认知科技有限公司 Voice recognition method, device and system combining offline command word and cloud analysis
CN111326147B (en) * 2018-12-12 2023-11-17 北京嘀嘀无限科技发展有限公司 Speech recognition method, device, electronic equipment and storage medium
CN110110294B (en) * 2019-03-26 2021-02-02 北京捷通华声科技股份有限公司 Dynamic reverse decoding method, device and readable storage medium
CN110610700B (en) * 2019-10-16 2022-01-14 科大讯飞股份有限公司 Decoding network construction method, voice recognition method, device, equipment and storage medium
CN113450781B (en) * 2020-03-25 2022-08-09 阿里巴巴集团控股有限公司 Speech processing method, speech encoder, speech decoder and speech recognition system
CN112331207A (en) * 2020-09-30 2021-02-05 音数汇元(上海)智能科技有限公司 Service content monitoring method and device, electronic equipment and storage medium
CN112100339A (en) * 2020-11-04 2020-12-18 北京淇瑀信息科技有限公司 User intention recognition method and device for intelligent voice robot and electronic equipment
CN113327597B (en) * 2021-06-23 2023-08-22 网易(杭州)网络有限公司 Speech recognition method, medium, device and computing equipment
CN114938337A (en) * 2022-04-12 2022-08-23 华为技术有限公司 Model training method and device and electronic equipment
KR102620070B1 (en) * 2022-10-13 2024-01-02 주식회사 타이렐 Autonomous articulation system based on situational awareness
KR102626954B1 (en) * 2023-04-20 2024-01-18 주식회사 덴컴 Speech recognition apparatus for dentist and method using the same
KR102632872B1 (en) * 2023-05-22 2024-02-05 주식회사 포지큐브 Method for correcting error of speech recognition and system thereof
KR102648689B1 (en) * 2023-05-26 2024-03-18 주식회사 액션파워 Method for text error detection

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102945673A (en) * 2012-11-24 2013-02-27 安徽科大讯飞信息科技股份有限公司 Continuous speech recognition method with speech command range changed dynamically
CN103177721A (en) * 2011-12-26 2013-06-26 中国电信股份有限公司 Voice recognition method and system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE60127398T2 (en) * 2000-05-23 2007-12-13 Thomson Licensing SYNTAX AND SEMANTIC ANALYSIS OF LANGUAGE INSTRUCTIONS
US9224394B2 (en) * 2009-03-24 2015-12-29 Sirius Xm Connected Vehicle Services Inc Service oriented speech recognition for in-vehicle automated interaction and in-vehicle user interfaces requiring minimal cognitive driver processing for same

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103177721A (en) * 2011-12-26 2013-06-26 中国电信股份有限公司 Voice recognition method and system
CN102945673A (en) * 2012-11-24 2013-02-27 安徽科大讯飞信息科技股份有限公司 Continuous speech recognition method with speech command range changed dynamically

Also Published As

Publication number Publication date
CN103700369A (en) 2014-04-02

Similar Documents

Publication Publication Date Title
CN103700369B (en) Phonetic navigation method and system
US10540965B2 (en) Semantic re-ranking of NLU results in conversational dialogue applications
CN110377716A (en) Exchange method, device and the computer readable storage medium of dialogue
US9117453B2 (en) Method and system for processing parallel context dependent speech recognition results from a single utterance utilizing a context database
US11164562B2 (en) Entity-level clarification in conversation services
CN110415679A (en) Voice error correction method, device, equipment and storage medium
US20210406473A1 (en) System and method for building chatbot providing intelligent conversational service
CN110265013A (en) The recognition methods of voice and device, computer equipment, storage medium
WO2014183035A1 (en) Method and system for capturing and exploiting user intent in a conversational interaction based information retrieval system
CN104573099A (en) Topic searching method and device
US9720982B2 (en) Method and apparatus for natural language search for variables
CN110266900A (en) Recognition methods, device and the customer service system that client is intended to
KR20080024752A (en) Dialog management apparatus and method for chatting agent
CN112364622B (en) Dialogue text analysis method, device, electronic device and storage medium
CN104469029A (en) Method and device for telephone number query through voice
CN110085217A (en) Phonetic navigation method, device and terminal device
CN109670033A (en) Search method, device, equipment and the storage medium of content
CN106844499A (en) Many wheel session interaction method and devices
CN110175242B (en) Human-computer interaction association method, device and medium based on knowledge graph
CN105323392A (en) Method and apparatus for quickly entering IVR menu
US11790168B2 (en) Natural language and messaging system integrated group assistant
CN103559242A (en) Method for achieving voice input of information and terminal device
US11210462B1 (en) Voice input processing
KR102655058B1 (en) Method and apparatus for providing chatbot service using condition expression generated interface
KR20220155957A (en) Method and apparatus for providing chatbot service based on slot filling

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: Wangjiang Road high tech Development Zone Hefei city Anhui province 230088 No. 666

Applicant after: Iflytek Co., Ltd.

Address before: Wangjiang Road high tech Development Zone Hefei city Anhui province 230088 No. 666

Applicant before: Anhui USTC iFLYTEK Co., Ltd.

COR Change of bibliographic data
C14 Grant of patent or utility model
GR01 Patent grant