CN103700369B

CN103700369B - Phonetic navigation method and system

Info

Publication number: CN103700369B
Application number: CN201310611734.9A
Authority: CN
Inventors: 高建清; 刘聪; 王智国; 胡国平; 胡郁
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2013-11-26
Filing date: 2013-11-26
Publication date: 2016-08-31
Anticipated expiration: 2033-11-26
Also published as: CN103700369A

Abstract

The invention discloses a kind of phonetic navigation method and system, belong to voice processing technology field.The method includes: receive the voice signal of user's input；Based on number of different types decoding network, described voice signal is unified decoding to identify, obtaining text word string, described number of different types decoding network includes following any two or three decoding network: extensive language model decoding network, order word decoding network, high frequency decoding network；Determine the operation that described text word string is corresponding；Perform described operation.Use this phonetic navigation method and system, it is ensured that the recognition performance to user individual voice response.

Description

Phonetic navigation method and system

Technical field

The present invention relates to voice processing technology field, particularly to a kind of phonetic navigation method and system.

Background technology

Voice Navigation technology, has a wide range of applications in self-service customer service system instantly.Being based particularly on should Answer the audio call navigation system of pattern, owing to having higher classification accuracy, become the language of current main flow Sound method of calling.

In prior art, service menu is generally designed as multilamellar knot by the audio call navigation system of answer-mode Structure, and system prompt sound is set on every layer of menu, to guide user to select.As user speech inputs " I want inquire about account detailed ", system can enter the second-level menu of " account inquiries ", and points out that " you are The same day to be inquired about is detailed, history is detailed or appointed day is detailed？", clear and definite by the way of menu displaying Supported type of service under current application environment, helps user to select.

Obviously the system prompt sound under answer-mode defines the scope that user answers to a certain extent, as Under above-mentioned " account inquiries " application scenarios, the prompt tone of system is for " you are intended to inquire about detail on the same day, history Detailed still appointed day is detailed？", grammer comprises " same day is detailed ", " history is detailed ", " specifies day Phase is detailed " three unit.To this, legacy system typically uses to be known defeated to user otherwise based on order word Entering voice to resolve, this method can obtain preferable recognition performance in the case of user coordinates, and Recognition efficiency is higher.But when user is not according to system prompt, i.e. carries out the input of order word, as to being " you are intended to inquire about detail on the same day, history is detailed or appointed day is detailed in the prompting of system？", user answers " I wants to look at the account of this morning ", then command control effect based on menu setecting is difficult to protect Card.That is, use the speech guide system of tradition answer-mode, it is difficult to ensure user individual voice is rung The recognition performance answered.

Summary of the invention

Embodiments provide a kind of phonetic navigation method and system, existing based on order word to solve Sound identification cannot ensure the problem to user individual voice response recognition effect.

The embodiment of the present invention following technical scheme of offer:

On the one hand, embodiments provide a kind of phonetic navigation method, including:

Receive the voice signal of user's input；

Based on number of different types decoding network, described voice signal is unified decoding to identify, obtain text Word string, described number of different types decoding network includes following any two or three decoding network: extensive Language model decoding network, order word decoding network, high frequency decoding network；

Determine the operation that described text word string is corresponding；

Perform described operation.

Preferably, described method also includes: building extensive language model decoding network, building process includes:

Corpus is utilized to build navigation field language model；

Collect the dialogic voice under particular navigation scene, and utilize described navigation field language model to described right Language sound is decoded, and obtains decoding word string；

Described decoding word string is utilized to train particular navigation scene language model；

Described navigation field language model and described particular navigation scene language model are carried out interpolation, obtains big Scale language model decoding network.

Preferably, described method also includes: building order word decoding network, building process includes:

Collecting the menu option under particular navigation scene, described menu selection item includes: pad name and not Name；

Described menu option in parallel forms order word decoding network；

Utilize the gram language model average probability in described extensive language model that described order word solution is set The weight of each word in code network.

Preferably, described method also includes: building high frequency decoding network, building process includes:

Collect the high frequency language material under particular navigation scene；

Described high frequency language material in parallel forms high frequency decoding network；

Utilize the gram language model average probability in described extensive language model that the decoding of described high frequency is set The weight of each word in network.

Preferably, described unification described voice signal based on number of different types decoding network decodes knowledge Not, obtain text word string to include:

It is decoded identifying to described voice signal based on extensive language model decoding network, obtains decoding knot First score of fruit；

It is decoded identifying to described voice signal based on order word decoding network, obtains the second of decoded result Score；

It is decoded identifying to described voice signal based on high frequency decoding network, obtains the 3rd of decoded result and obtain Point；

Select the first score, the second score decoded result corresponding with the maximum score in the 3rd score as institute State text word string.

Preferably, described method also includes:

During being decoded identifying to described voice signal based on extensive language model decoding network, If a paths occurring preset semantic associative key or expansion word, then described decoding paths is carried out Preset weight gain；

Using the score after gain as the score of described decoding paths.

Preferably, the described operation determining that described text word string is corresponding includes:

If the decoded result that described text word string is order word decoding network, then according to described decoded result pair The semanteme answered determines the operation that described text word string is corresponding；

Otherwise, described decoded result and described lists of keywords are carried out Keywords matching, obtain matching result；

The operation that described text word string is corresponding is determined according to the semanteme that described matching result is corresponding.

Preferably, described method also includes:

Business function is organized into multi-menu structure, and every layer of menu is set up respectively key word row Table；

Described described decoded result and lists of keywords are carried out Keywords matching, obtain matching result and include:

Determine the menu hierarchy that current business is corresponding；

Obtain the lists of keywords of described menu hierarchy and layers below thereof；

The lists of keywords of described decoded result Yu acquisition is carried out successively Keywords matching, obtains coupling knot Really.

On the other hand, embodiments provide a kind of speech guide system, including:

Receiver module, for receiving the voice signal of user's input；

Decoder module, for unifying to decode to described voice signal based on number of different types decoding network Identifying, obtain text word string, described number of different types decoding network includes that following any two or three solves Code network: extensive language model decoding network, order word decoding network, high frequency decoding network；

Determine module, for determining the operation that described text word string is corresponding；

Perform module, be used for performing described operation.

Preferably, described system also includes following any two or three module:

First builds module, is used for building extensive language model decoding network；

Second builds module, is used for building order word decoding network；

3rd builds module, is used for building high frequency decoding network.

Preferably, described first structure module includes:

First language model unit, is used for utilizing corpus to build navigation field language model；

Decoding unit, for collecting the dialogic voice under particular navigation scene, and utilizes described navigation field language Described dialogic voice is decoded by speech model, obtains decoding word string；

Second language model unit, is used for utilizing described decoding word string to train particular navigation scene language model；

Interpolating unit, for entering described navigation field language model and described particular navigation scene language model Row interpolation, obtains extensive language model decoding network.

Preferably, described second structure module includes:

Menu option unit, for collecting the menu option under particular navigation scene, described menu option includes: Pad name and another name thereof；

First parallel units, forms order word decoding network for described menu option in parallel；

First weighted units, for utilizing the gram language model average probability in described extensive language model The weight of each word in described order word decoding network is set.

Preferably, described 3rd structure module includes:

High frequency language material unit, for collecting the high frequency language material under particular navigation scene；

Second parallel units, forms high frequency decoding network for described high frequency language material in parallel；

Second weighted units, for utilizing the gram language model average probability in described extensive language model The weight of each word in described high frequency decoding network is set.

Preferably, described decoder module includes:

First decoding unit, for solving described voice signal based on extensive language model decoding network Code identifies, obtains the first score of decoded result；

Second decoding unit, for being decoded identifying to described voice signal based on order word decoding network, Obtain the second score of decoded result；

3rd decoding unit, for being decoded identifying to described voice signal based on high frequency decoding network, The 3rd score to decoded result；

Select unit, for selecting the first score, the second score corresponding with the maximum score in the 3rd score Decoded result is as described text word string.

Preferably, described first decoding unit, be additionally operable to based on extensive language model decoding network to institute During predicate tone signal is decoded identifying, if a paths occurring preset semantic associative key Or expansion word, then carry out described decoding paths presetting weight gain, and using the score after gain as institute State the score of decoding paths.

Preferably, described determine that module includes:

Judging unit, for judging that whether described text word string is the decoded result of order word decoding network；

First determines unit, for judging that described text word string is order word decoding network at described judging unit Decoded result after, determine, according to the semanteme that described decoded result is corresponding, the operation that described text word string is corresponding；

At described judging unit, Keywords matching unit, for judging that described text word string is not the decoding of order word After the decoded result of network, described decoded result and lists of keywords are carried out Keywords matching, obtain coupling Result；

Second determines unit, for determining that described text word string is corresponding according to the semanteme that described matching result is corresponding Operation.

Preferably, described system also includes:

Lists of keywords builds module, for business function is organized into multi-menu structure, and to every layer of dish A lists of keywords set up respectively by list；

Described Keywords matching unit includes:

Menu hierarchy determines unit, for determining the menu hierarchy that current business is corresponding；

Lists of keywords acquiring unit, for obtaining the key word row of described menu hierarchy and layers below thereof Table；

Matching unit, for carrying out successively key word by the lists of keywords of described decoded result Yu acquisition Join, obtain matching result.

The phonetic navigation method of embodiment of the present invention offer and system, combine number of different types decoding network Advantage, by the voice signal that user is inputted, use and unite based on number of different types decoding network One decoding identifies, obtains text word string and the operation of correspondence such that it is able to respond the personalized speech of user It is identified.On the premise of ensureing recognition result, improve the motility of user's response.Utilize the present invention The audio recognition method of embodiment offer and system, can be substantially improved Consumer's Experience.

Accompanying drawing explanation

In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to enforcement In example, the required accompanying drawing used is briefly described, it should be apparent that, the accompanying drawing in describing below is only Some embodiments described in the present invention, for those of ordinary skill in the art, it is also possible to according to these Accompanying drawing obtains other accompanying drawing.

Fig. 1 is Voice Navigation process schematic of the prior art；

Fig. 2 is the phonetic navigation method flow chart that the embodiment of the present invention provides；

Fig. 3 is the building process schematic diagram of the extensive language model decoding network that the embodiment of the present invention provides；

Fig. 4 is the building process schematic diagram of the order word decoding network that the embodiment of the present invention provides；

Fig. 5 is the building process schematic diagram of the high frequency decoding network that the embodiment of the present invention provides；

Fig. 6 is the structural representation of the speech guide system that the embodiment of the present invention provides.

Detailed description of the invention

In order to make those skilled in the art be more fully understood that the scheme of the embodiment of the present invention, below in conjunction with the accompanying drawings and The embodiment of the present invention is described in further detail by embodiment.

First below Voice Navigation process of the prior art is simply introduced.

As it is shown in figure 1, be Voice Navigation process schematic of the prior art.

In prior art, voice signal user inputted based on order word decoding network is generally used to solve Code identifies, specifically includes procedure below:

Step 101: receive the voice signal of user's input；

Step 102: be decoded identifying to voice signal based on order word decoding network, obtain text word string；

Step 103: determine the operation that text word string is corresponding；

Step 104: perform operation.

Generally, order word decoding network is built-up by collecting key word common under particular navigation scene 's.Common key word can be pad name and another name etc. thereof, is used for forming menu option, by obtain Menu option is in parallel thus obtains order word decoding network.Use the language that user is inputted by order word decoding network After tone signal is decoded identifying, obtaining text word string, text word is ganged up the most corresponding with user view, should Text word string can be as the decoded result of order word decoding network.

After obtaining decoded result, need judge decoding identify confidence level, namely judge decoding identify can By property, generally use LRT(likelihood ratio testing, likelihood ratio test) add up.Assume H₀Generation Table identification is correct, H₁Represent identify mistake, generally decoding optimal path score is designated as p (X | H₀), and make With the summation of other all path score approximate p (X | H₁).Generally, Systematic selection LLR(Log Likelihood Ratio, log likelihood ratio), as the confidence score of recognition result: LLR=logp(X|H₀)-logp(X|H₁).By the confidence score of recognition result whether more than presetting threshold Value, judges that recognition result is the most reliable.If greater than predetermined threshold value, then explanation recognition result is reliable；No Then, illustrate that recognition result is unreliable.

If recognition result is reliable, then obtains the semantic information that recognition result is corresponding, may thereby determine that text The user view that word string is corresponding, operation corresponding to this user view is often corresponding with menu option.Due to In the decoding of order word, each order word is respectively to the menu option that should determine that, so being easy to according to order word Obtain the menu function of correspondence.If recognition result is unreliable, then prompting user re-enters voice signal, Again it is decoded voice signal identifying.

Therefore, Voice Navigation of the prior art, it is only used for the order word navigation that menu option is corresponding, right Voice signal outside menu option, owing to the recognition result obtained is unreliable, by circulation prompting user's weight Multiple input speech signal, it is impossible to obtain effective recognition result.

To this end, the embodiment of the present invention proposes a kind of phonetic navigation method, it is possible to effectively identify user not according to The voice signal of menu prompt input.

As in figure 2 it is shown, be the phonetic navigation method flow chart of embodiment of the present invention offer, comprise the following steps:

Step 201: receive the voice signal of user's input；

Step 202: based on number of different types decoding network, voice signal is unified decoding and identify, To text word string；

Step 203: determine the operation that text word string is corresponding；

Step 204: perform operation.

The embodiment of the present invention provide phonetic navigation method, utilize the decoding network of number of different types, to The voice signal of family input carries out unifying decoding and identifies, wherein, the decoding network of number of different types can be In extensive language model decoding network, order word decoding network and high frequency decoding network any two kinds or three Kind.

It is decoded identifying different to voice signal from prior art is based purely on order word decoding network It is that voice signal is unified decoding and known by embodiment of the present invention decoding network based on number of different types Not, different types of decoding network is searched for optimal path, thus ensures the reliability of recognition result, and And be capable of identify that be based purely on order word corresponding to the menu option of order word decoding network institute None-identified it Outer voice.Specifically, after the voice signal receiving user's input, the acoustics extracting voice signal is special Levy sequence, obtain the decoding paths set of the unified decoding that acoustic features sequence pair is answered, and therefrom search for optimum Path.

Building process to three kinds of different types of decoding networks involved by the embodiment of the present invention separately below Briefly introduce.

As it is shown on figure 3, the building process of the extensive language model decoding network provided for the embodiment of the present invention Schematic diagram.The building process of extensive language model decoding network is as follows:

Step 301: utilize corpus to build navigation field language model；

Step 302: collect the dialogic voice under particular navigation scene, and utilize navigation field language model pair Dialogic voice is decoded, and obtains decoding word string；

Step 303: utilize decoding word string training particular navigation scene language model；

Step 304: navigation field language model and particular navigation scene language model are carried out interpolation, obtains Extensive language model decoding network.

In embodiments of the present invention, navigation field language model is the relevant training language by collecting navigation field Material is trained obtaining, and wherein, training method can be this area routine training method, the most superfluous at this State；The language model of navigation field is usually order language model, such as, can be ternary (Tri-gram) Language model or binary (bi-gram) language model etc., the embodiment of the present invention is not defined.

Collect the dialogic voice under certain specific navigation scenarios, utilize the navigation field language that step 301 builds Dialogic voice is decoded by speech model, obtains decoding word string, then is trained obtaining this spy to decoding word string Determine navigation scenarios language model.Navigation field language model and particular navigation scene language model are inserted Value, i.e. can get extensive language model decoding network.Interpolation is technological means commonly used in the art, the most only Do briefly introduction.

Assuming that navigation field language model is first language model, its model unit (N-gram) sum is N1, particular navigation scene language model is second language model, and its model unit sum is N2, then interpolation N-gram sum in the language model obtained is N1+N2-(first language model and second language model In identical N-gram number), the probability of the most each N-gram is first language model and second language model The probability weight of corresponding units and.Such as, first language model exists certain N-gram for " online- Open-minded ", what this word of i.e. " surfing the Net " connect below is " open-minded ", and its probability is P1, second language model In there is also " online-open-minded " this N-gram, its probability is P2, then " surf the Net-open in final mask Logical " probability of this N-gram is k*P1+ (1-k) * P2, wherein, k is interpolation weights.

As shown in Figure 4, it is the building process schematic diagram of order word decoding network that provides of the embodiment of the present invention.

The building process of order word decoding network is as follows:

Step 401: collecting the menu option under particular navigation scene, wherein, menu option includes menu identity Claim and another name；

Step 402: menu option in parallel forms order word decoding network；

Step 403: utilize the gram language model average probability setting command word solution in extensive language model The weight of each word in code network.

In embodiments of the present invention, the menu option under certain particular navigation scene, such as pad name are collected And another name, the menu option collected is carried out parallel connection, order word decoding network can be obtained.Due to The score of the order word decoding network decoded result arrived and extensive language model decoding network decoded result Score cannot be carried out comparing, and in order to make the score of decoded result have comparability, needs the order word obtained Decoding network is optimized.Specifically can use unitary (uni-gram) language in extensive language model The weight of each word in model average probability setting command word decoding network, will be weighted the score after processing, Score as order word decoding network decoded result.Such as, in order word decoding network certain word for " to open Logical surfing Internet with cell phone ", the acoustic model scores of its correspondence represents with ScoreA, in extensive language model Gram language model average probability score represents with ScoreB, then the decoded result of this word must be divided into: ScoreA (opens Logical)+ScoreB+ScoreA (mobile phone)+ScoreB+ScoreA (online)+ScoreB.

As it is shown in figure 5, be the building process schematic diagram of the high frequency decoding network that the embodiment of the present invention provides.

The building process of high frequency decoding network is as follows:

Step 501: collect the high frequency language material under particular navigation scene；

Step 502: high frequency language material in parallel forms high frequency decoding network；

Step 503: utilize the gram language model average probability in extensive language model that high frequency decoding is set The weight of each word in network.

In embodiments of the present invention, the building process of high frequency decoding network and the structure of mentioned order word decoding network Process of building is similar to, and the different high frequency language materials under the particular navigation scene being to collect is different from menu Title, such as, can be complete sentence.After high frequency language material in parallel forms high frequency decoding network, to high frequency Decoding network is optimized the process of process, with reference to the optimization process of foregoing command word decoding network, at this not Repeat again.

In embodiments of the present invention, it is primarily based on number of different types decoding network voice signal is unified Decoding identifies, then searches for optimal path in different types of decoding network.By calculating dissimilar solution The height of code network decoding result score, carries out the selection of optimal path, by the decoding network of highest scoring As optimal path, decoded result corresponding to this decoding network is as text word string.Such as, based on a large scale Voice signal is decoded identifying by language model decoding network, obtains the first score of decoded result；Based on Voice signal is decoded identifying by order word decoding network, obtains the second score of decoded result；Based on height Frequently voice signal is decoded identifying by decoding network, obtains the 3rd score of decoded result；First is selected to obtain Point, the second score decoded result corresponding with the maximum score in the 3rd score identify knot as unified decoding Really, i.e. text word string described in preceding step 202.

During being decoded identifying to voice signal based on extensive language model decoding network, if One paths occurs preset semantic associative key or expansion word, in order to improve obtaining of efficient decoding path Point, it is ensured that it is won in decoding competition, can carry out this decoding paths further presetting weight gain, Using the score after gain as the score of this decoding paths.Specifically, gain method is as follows:

(1) determine default semantic associative key or the expansion word of current path, and obtain it at language Weighted score p (x) in model；

(2) carry out weighted score p (x) presetting weight gain, such as p ' (x)=p (x) * a, wherein, a > 1, a For default weight gain；

(3) using the score after gain as the score of decoding paths.

In the embodiment of the present invention, identifying voice signal being unified decoding, after obtaining text word string, needing The operation that text word string to be determined is corresponding, can perform corresponding operating.

Just how to determine that the operation that text word string is corresponding is simply introduced below:

First, it is determined that whether text word string is the decoded result of order word decoding network；If it is, according to The semanteme that decoded result is corresponding determines the operation that text word string is corresponding；Otherwise, then by decoded result and key word List carries out Keywords matching, obtains matching result；Text is may determine that according to the semanteme that matching result is corresponding The operation that word string is corresponding.

When the decoded result that text word string is order word decoding network, it is meant that user is defeated according to menu prompt Enter voice signal, after being decoded voice signal identifying, it is possible to obtain the decoding corresponding with menu option As a result, now, i.e. can determine that, according to the semanteme that decoded result is corresponding, the operation that text word string is corresponding.

When the decoded result that text word string is not order word decoding network, namely text word string is extensive language During the decoded result of the speech decoded result of model decoding network or high frequency decoding network, it is meant that user does not presses According to menu prompt input speech signal, after being decoded voice signal identifying, it is impossible to obtain selecting with menu The decoded result that item is corresponding, at this time, it may be necessary to utilize lists of keywords that decoded result is carried out Keywords matching.

In embodiments of the present invention, lists of keywords can build in advance, according to specific session operational scenarios Corresponding lists of keywords can be obtained.Menu option corresponding to text word string to be determined typically requires respectively Determine its operational order and/or parameter.Due to decoded result correspondence text word string, by text word string with crucial After word list carries out Keywords matching, may be matched result, this matching result generally comprises operational order And/or parameter, may determine that, according to the semanteme that matching result is corresponding, the operation that text word string is corresponding.

In actual applications, the key word of corresponding different business function can be arranged on a lists of keywords In, when mating, directly invoke this lists of keywords and carry out Keywords matching.And it is typically due to business Application has hierarchical relationship, therefore, in order to improve matching efficiency further, can be organized into by business function Multi-menu structure, the lists of keywords of a local set up by every layer of menu.Correspondingly, key word is being carried out During coupling, can first determine the menu hierarchy that current business is corresponding, obtain this menu hierarchy and following The lists of keywords of layer, then by text word string and lists of keywords are carried out successively Keywords matching, To matching result.Described matching result can comprise operational order and/or parameter, thus obtain text word The operation that string is corresponding.

Below as a example by " surfing Internet with cell phone " business, keyword match technique in the embodiment of the present invention is carried out in detail Describe in detail bright:

Such as " surfing Internet with cell phone " business, corresponding operation is " open-minded ", " cancellation ", " inquiry entry-into-force time " Deng, corresponding parameter is " ten yuan of set meals ", " 20 yuan of set meals " etc..

Keywords matching process is as follows:

(1) lists of keywords of its correspondence is obtained according to session operational scenarios.

Such as, under selecting " surfing Internet with cell phone " business scenario, its prompt tone is for " could you tell me and be intended to open also It is to cancel set meal？", its lists of keywords is " open-minded ", " cancellation ", " entry-into-force time " etc.；Selecting Under " set meal type " business scenario, its prompt tone be " good, open online set meal, we have five yuan, Ten yuan, 20 yuan of multiple set meals, could you tell me any？", its lists of keywords be " five yuan ", " ten Unit ", " 20 yuan " etc..

(2) text word string and lists of keywords are mated, obtain matching result.

Such as, continuous text word string for " I look on the bright side of things logical five yuan online set meals ", its lists of keywords is for " to open Logical ", " cancellation ", " entry-into-force time " etc., first match " open-minded ", i.e. the session operational scenarios of correspondence is " open online set meal ".

The phonetic navigation method that the embodiment of the present invention provides, believes voice based on number of different types decoding network Number carry out unifying decoding to identify, the problem preferably solving user individual voice response identification.If used Family carries out response according to suggestion voice, and order word decoding network can quickly be identified；If user is not Carry out response, extensive language model decoding network and/or high frequency decoding network according to suggestion voice, coordinate Keyword match technique, it is possible to achieve the identification correct to voice signal.Use this phonetic navigation method, energy Enough recognition performances ensured user individual voice response.

Correspondingly, the embodiment of the present invention also provides for a kind of speech guide system, as shown in Figure 6, is the present invention The structural representation of the speech guide system that embodiment provides.

In the embodiment of the present invention, speech guide system, including:

Receiver module 601, for receiving the voice signal of user's input；

Decoder module 602, for unifying described voice signal based on number of different types decoding network Decoding identifies, obtains text word string；

Determine module 603, for determining the operation that text word string is corresponding；

Perform module 604, be used for performing operation.

Above-mentioned number of different types decoding network can include following any two or three decoding network: advises greatly Mould language model decoding network, order word decoding network, high frequency decoding network.

Correspondingly, the speech guide system of the embodiment of the present invention can also include building above-mentioned each decoding network Module, is respectively as follows: the first structure module, the second structure module and the 3rd structure module.Wherein:

First builds module, is used for building extensive language model decoding network, and this first structure module is permissible Including: first language model unit, it is used for utilizing corpus to build navigation field language model；Decoding is single Unit, for collecting the dialogic voice under particular navigation scene, and utilizes navigation field language model to language Sound is decoded, and obtains decoding word string；Second language model unit, is used for utilizing decoding word string training specific Navigation scenarios language model；Interpolating unit, for navigation field language model and particular navigation scene language Model carries out interpolation, obtains extensive language model decoding network.

Second builds module, is used for building order word decoding network, and this second structure module may include that dish Uniterming unit, for collecting the menu option under particular navigation scene, described menu option includes: menu Title and another name thereof；First parallel units, forms order word decoding network for described menu option in parallel； First weighted units, for utilizing the gram language model average probability in described extensive language model to arrange The weight of each word in described order word decoding network.

3rd builds module, is used for building high frequency decoding network, and the 3rd builds module may include that high frequency Language material unit, for collecting the high frequency language material under particular navigation scene；Second parallel units, for institute in parallel State high frequency language material and form high frequency decoding network；Second weighted units, for utilizing in extensive language model One gram language model average probability arranges the weight of each word in high frequency decoding network.

In actual applications, decoder module 602 may include that

First decoding unit, for being decoded knowing to voice signal based on extensive language model decoding network Not, the first score of decoded result is obtained；

Second decoding unit, for being decoded identifying to voice signal based on order word decoding network, obtains Second score of decoded result；

3rd decoding unit, for being decoded identifying to voice signal based on high frequency decoding network, is solved 3rd score of code result；

Select unit, for selecting the first score, the second score corresponding with the maximum score in the 3rd score Decoded result is as text word string.

In embodiments of the present invention, the first decoding unit, it is additionally operable to decoding net based on extensive language model During voice signal is decoded identifying by network, if a paths occurring preset semantic related keyword Word or expansion word, then carry out decoding paths presetting weight gain, and using the score after gain as decoding The score in path.

It should be noted that in actual applications, can be according to application needs, decoder module 602 only comprises In above-mentioned first decoding unit, the second decoding unit and the 3rd decoding unit any two kinds.

In embodiments of the present invention, determine that module 603 may include that

Judging unit, for judging that whether text word string is the decoded result of order word decoding network；

First determines unit, for judging, at judging unit, the decoding knot that text word string is order word decoding network After Guo, determine, according to the semanteme that decoded result is corresponding, the operation that text word string is corresponding；

At described judging unit, Keywords matching unit, for judging that text word string is not order word decoding network Decoded result after, decoded result and lists of keywords are carried out Keywords matching, obtain matching result；

Second determines unit, for determining, according to the semanteme that matching result is corresponding, the operation that text word string is corresponding.

It should be noted that in actual applications, the key word of corresponding different business function can be arranged on In one lists of keywords, Keywords matching unit, when mating, directly invokes this lists of keywords and enters Row Keywords matching.It addition, in order to improve matching efficiency further, it is also possible to set up corresponding with each layer menu Multilamellar lists of keywords.Correspondingly, Keywords matching unit uses the mode successively mated to carry out key word Matching treatment, it is thus achieved that matching result.

Such as, in another embodiment of present system, described system can also include:

Lists of keywords builds module, for business function is organized into multi-menu structure, and to every layer of dish A lists of keywords set up respectively by list.

Correspondingly, in this embodiment, described Keywords matching unit may include that

Arranging above-mentioned lists of keywords structure module can facilitate system developer according to different the answering of user Carry out function setting flexibly by demand, improve development efficiency.

The phonetic navigation method that above-described embodiment provides and system belong to same inventive concept, and it implemented Journey refers to embodiment of the method, repeats no more here.

The speech guide system that the embodiment of the present invention provides, believes voice based on number of different types decoding network Number carry out unifying decoding to identify, the problem preferably solving user individual voice response identification.If used Family carries out response according to menu option suggestion voice, and order word decoding network can quickly be identified；As Really user does not carries out response, extensive language model decoding network and high frequency according to menu option suggestion voice Decoding network, coordinates keyword match technique, it is possible to achieve voice signal is carried out correct identification.Use This speech guide system, it is possible to ensure the recognition performance to user individual voice response.

Each embodiment in this specification all uses the mode gone forward one by one to describe, identical between each embodiment Similar part sees mutually, what each embodiment stressed is different from other embodiments it Place.For system embodiment, owing to it is substantially similar to embodiment of the method, so describing Fairly simple, relevant part sees the part of embodiment of the method and illustrates.System described above is implemented Example is only that schematically the wherein said unit illustrated as separating component can be or may not be Physically separate, the parts shown as unit can be or may not be physical location, the most permissible It is positioned at a place, or can also be distributed on multiple NE.Can select according to the actual needs Some or all of module therein realizes the purpose of the present embodiment scheme.Those of ordinary skill in the art exist In the case of not paying creative work, i.e. it is appreciated that and implements.

The foregoing is only presently preferred embodiments of the present invention, not in order to limit the present invention, all in the present invention Spirit and principle within, any modification, equivalent substitution and improvement etc. made, should be included in this Within bright protection domain.

Claims

1. a phonetic navigation method, it is characterised in that including:

Receive the voice signal of user's input；

Determine the operation that described text word string is corresponding；

Perform described operation；

Wherein, the described operation determining that described text word string is corresponding includes:

Otherwise, described decoded result and lists of keywords are carried out Keywords matching, obtain matching result；

Method the most according to claim 1, it is characterised in that described method also includes: build big Scale language model decoding network, building process includes:

Corpus is utilized to build navigation field language model；

Method the most according to claim 1, it is characterised in that described method also includes: build life Making word decoding network, building process includes:

Collecting the menu option under particular navigation scene, described menu option includes: pad name and another name thereof；

Described menu option in parallel forms order word decoding network；

Method the most according to claim 1, it is characterised in that described method also includes: build height Frequently decoding network, building process includes:

5. according to the method described in any one of Claims 1-4, it is characterised in that described based on multiple Described voice signal is unified decoding and is identified by dissimilar decoding network, obtains text word string and includes:

Method the most according to claim 5, it is characterised in that described method also includes:

Using the score after gain as the score of described decoding paths.

7. according to the method described in any one of Claims 1-4, it is characterised in that described method is also wrapped Include:

Determine the menu hierarchy that current business is corresponding；

8. a speech guide system, it is characterised in that including:

Receiver module, for receiving the voice signal of user's input；

Perform module, be used for performing described operation；

Wherein, described determine that module includes:

System the most according to claim 8, it is characterised in that described system also include following arbitrarily Two or three module:

Second builds module, is used for building order word decoding network；

3rd builds module, is used for building high frequency decoding network.

System the most according to claim 9, it is characterised in that described first builds module includes:

11. systems according to claim 9, it is characterised in that described second builds module includes:

12. systems according to claim 9, it is characterised in that the described 3rd builds module includes:

13. according to Claim 8 to the system described in 12 any one, it is characterised in that described decoding mould Block includes:

14. systems according to claim 13, it is characterised in that

Described first decoding unit, is additionally operable to believing described voice based on extensive language model decoding network During number being decoded identifying, if a paths occurring preset semantic associative key or extension Word, then carry out described decoding paths presetting weight gain, and using the score after gain as described decoding road The score in footpath.

15. according to Claim 8 to the system described in 12 any one, it is characterised in that described system is also Including:

Described Keywords matching unit includes: