CN107526826B

CN107526826B - Voice search processing method and device and server

Info

Publication number: CN107526826B
Application number: CN201710773346.9A
Authority: CN
Inventors: 杜念冬; 马赛; 谢延
Original assignee: Baidu Online Network Technology Beijing Co Ltd
Current assignee: Baidu Online Network Technology Beijing Co Ltd
Priority date: 2017-08-31
Filing date: 2017-08-31
Publication date: 2021-09-17
Anticipated expiration: 2037-08-31
Also published as: CN107526826A

Abstract

The invention provides a voice search processing method, a voice search processing device and a server, wherein the method comprises the following steps: acquiring a voice search statement; respectively identifying the search sentences and judging the language types of the search sentences according to N language models, wherein each language model corresponds to one type of language, and N is a positive integer greater than 1; when the search statement is determined to belong to the ith type of language, acquiring an identification result corresponding to the ith type of language model; and searching according to the identification result. Therefore, recognition and search of the voice search sentences are achieved, the efficiency of voice search processing is improved, the waiting time of the user is reduced, and the user experience is improved.

Description

Voice search processing method and device and server

Technical Field

The invention relates to the technical field of computers, in particular to a voice search processing method, a voice search processing device and a server.

Background

With the development of the internet and information technology, more and more users search various information through the internet.

When searching for a user, the existing search engine with a multi-language search function generally identifies a search sentence according to a commonly used language type after the search sentence is obtained, then judges the accuracy of an identification result, and re-identifies the search sentence after the language type is switched if the accuracy is low until the accuracy of a determined identification result is high, and then searches according to the determined identification result.

The search mode has complex and long time-consuming judgment process for the voice type of the search statement, thereby causing long time-consuming search processing process, low search efficiency and poor user experience.

Disclosure of Invention

The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.

Therefore, the invention provides a voice search processing method, which realizes recognition and search of voice search sentences, improves the efficiency of voice search processing, reduces the waiting time of users and improves the user experience.

The invention also provides a voice search processing device.

The invention also provides a server.

The invention also provides a computer readable storage medium.

An embodiment of a first aspect of the present invention provides a method for processing a voice search, including: acquiring a voice search statement; respectively identifying the search sentences and judging the language types of the search sentences according to N language models, wherein each language model corresponds to one type of language, and N is a positive integer greater than 1; when the search statement is determined to belong to the ith type of language, acquiring an identification result corresponding to the ith type of language model; and searching according to the identification result.

The voice search processing method of the embodiment of the invention comprises the steps of firstly obtaining voice search sentences, then respectively identifying the search sentences according to N language models, simultaneously judging the language types of the search sentences, obtaining identification results corresponding to the ith type of language models when determining that the search sentences belong to the ith type of language, and finally searching according to the identification results. Therefore, recognition and search of the voice search sentences are achieved, the efficiency of voice search processing is improved, the waiting time of the user is reduced, and the user experience is improved.

The embodiment of the second aspect of the present invention provides a speech search processing apparatus, including: the first acquisition module is used for acquiring a voice search statement; the judging module is used for identifying the search sentences and judging the language types of the search sentences at the same time according to N language models, wherein each language model corresponds to one type of language, and N is a positive integer greater than 1; the second obtaining module is used for obtaining a recognition result corresponding to the language model of the ith type when the search statement is determined to belong to the language of the ith type; and the searching module is used for searching according to the identification result.

The voice search processing device of the embodiment of the invention firstly obtains the voice search sentences, then respectively identifies the search sentences according to the N language models, simultaneously judges the language types of the search sentences, obtains the identification results corresponding to the ith type of language models when determining that the search sentences belong to the ith type of language, and finally searches according to the identification results. Therefore, recognition and search of the voice search sentences are achieved, the efficiency of voice search processing is improved, the waiting time of the user is reduced, and the user experience is improved.

An embodiment of a third aspect of the present invention provides a server, including:

a memory, a processor and a computer program stored on the memory and executable on the processor, which when executed by the processor implements the speech search processing method according to the first aspect.

A fourth aspect of the present invention provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the speech search processing method according to the first aspect.

Drawings

The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a flow diagram of a voice search processing method of one embodiment of the present invention;

FIG. 2 is a flow diagram of a voice search processing method according to another embodiment of the invention;

FIG. 3 is a schematic structural diagram of a speech search processing apparatus according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a speech search processing apparatus according to another embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

When searching for a user, the existing search engine with a multi-language search function generally identifies a search sentence according to a commonly used language type after the search sentence is obtained, then judges the accuracy of an identification result, and re-identifies the search sentence after the language type is switched if the accuracy is low until the accuracy of a determined identification result is high, and then searches according to the determined identification result. The search mode has complex and long time-consuming judgment process for the voice type of the search statement, thereby causing long time-consuming search processing process, low search efficiency and poor user experience.

In order to solve the above problems, embodiments of the present invention provide a speech search processing method, after a speech search speech is obtained, the speech search speech is recognized according to multiple language models, and a language type to which a search statement belongs is determined, where each language model corresponds to one type of language, and when a language to which a search statement belongs is determined, a recognition result of the language model corresponding to the language type is obtained, so as to perform a search according to the recognition result. Therefore, recognition and search of the voice search sentences are achieved, the efficiency of voice search processing is improved, the waiting time of the user is reduced, and the user experience is improved.

A voice search processing method, apparatus, and server according to embodiments of the present invention are described below with reference to the accompanying drawings.

Fig. 1 is a flowchart of a voice search processing method according to an embodiment of the present invention.

As shown in fig. 1, the voice search processing method includes:

step 101, obtaining a voice search statement.

The main execution body of the voice search processing method provided by the embodiment of the invention is the voice search processing device provided by the embodiment of the invention, and the device can be configured in any server with a search function to search the acquired voice search sentences.

Specifically, a voice input device such as a microphone may be preset in the terminal, so that when the user needs to search for information, the terminal may obtain a voice search statement input by the user through the voice input device, and send the search statement to the voice search processing apparatus.

Step 102, respectively identifying the search sentences and simultaneously judging the language types of the search sentences according to N language models, wherein each language model corresponds to one type of language, and N is a positive integer greater than 1.

The N language models may include language models corresponding to all types of languages, or may include multiple language models determined according to needs, which is not limited herein.

It will be appreciated that N language models need to be determined before the search statement is identified, based on the N language models, respectively. Specifically, the N language models may be determined in a variety of ways.

For example, the determination may be made according to a history search log, that is, before step 102, the method may further include: and determining N language models according to the historical search logs.

The history search log may be a history search record or other history search records when the user performs a search using the terminal, and is not limited herein.

Specifically, it may be determined which types of languages are often used for searching when a user to which the terminal belongs searches according to the history search records, so that N types of language models to be identified for the search sentences are determined according to the search frequency corresponding to each language type of the search sentences.

In a specific implementation, the value of N may be preset, and after the search frequencies corresponding to the search sentences of different language types are determined, the language types may be sorted in the order from high to low, so that the language models corresponding to the N language types arranged in front are determined as the N language models to be identified for the search sentences.

For example, assuming that N is 2, it is determined that the frequency of searching in chinese, the frequency of searching in english, and the frequency of searching in korean are 200 and 10, respectively, for the user to which the terminal belongs, according to the history of searching in a certain period of time. The language models of the chinese and english types may be determined as 2 language models to be recognized for the search sentence according to the search frequencies respectively corresponding to the chinese, english, and korean languages.

Alternatively, a search frequency threshold may be preset, so that after the search frequencies corresponding to the search sentences of different language types are determined, the language model corresponding to the language type with the search frequency greater than the preset threshold may be determined as the language model to be identified for the search sentence.

In addition, N language models to be used for recognizing the search term may be determined based on the historical usage information of the terminal. The historical usage information may be usage information of each application in the terminal by the user, location information where the terminal is often located in a period of time, and the like.

For example, if it is determined that the user frequently moves to and from the united states and china according to the location information of the terminal, and the common language types of the user corresponding to the united states and the china are english and chinese, the language models corresponding to the chinese and the english, respectively, may be determined as the language model to be recognized for the search sentence.

In a specific implementation, the language type to which the search sentence belongs can be determined through the following steps 102a-102 b.

Step 102a, determining a feature vector of a search statement.

The feature vector is used for representing the features of the acquired voice search statement.

Specifically, after the voice search processing device obtains the voice search statement, the feature vector of the obtained voice search statement may be determined by a plurality of methods, such as a mel cepstrum coefficient, a linear prediction cepstrum coefficient, and a multimedia content description interface.

And step 102b, determining the language type of the search statement according to the matching degree of the feature vector and each preset language type model.

Specifically, each language type model may be obtained by training in advance according to a large number of historical corpora of various types of languages, so that after the feature vector of the obtained speech search statement is determined, the feature vector may be input into each language type model to check and score, and the language type model with the highest score, that is, the language type corresponding to the language type model with the highest matching degree of the feature vector, is determined as the language type to which the search statement belongs.

And 103, acquiring a recognition result corresponding to the language model of the ith type when the search statement is determined to belong to the language of the ith type.

And 104, searching according to the identification result.

Specifically, search sentences of different language types may be preset and correspond to different resource pools, so that after the recognition result of the language model corresponding to the language type is obtained according to the language type to which the search sentence belongs, the search can be performed in the resource pool corresponding to the language type.

It should be noted that, when searching in the resource library corresponding to each language type, the search may be a normal search, that is, the search is performed in the unstructured resource library corresponding to the language type; it may also be a vertical search, i.e. a search in a structured repository of corresponding language types.

It can be understood that, in the speech search processing method provided in the embodiment of the present invention, the process of determining the language type to which the search statement belongs is performed simultaneously with the process of identifying the search statement, so that the efficiency of speech search processing can be improved, and the waiting time of the user can be reduced. And the search sentences are identified according to the N language models simultaneously, and after the language types to which the search sentences belong are determined, the identification results of the language models corresponding to the language types are obtained from the multiple identification results, so that the accuracy and the reliability of the identification results of the voice search sentences can be ensured.

In addition, in the embodiment of the present invention, the language type to which the search term belongs may be determined first, and then the search term is recognized according to the language model corresponding to the determined language type to obtain the recognition result, so as to perform the search according to the recognition result of the language model corresponding to the language type.

Through the analysis, after the voice search voice is obtained, the search voice is recognized according to the multiple language models respectively, the language type of the search statement can be judged, when the language of the search statement is determined, the recognition result of the language model corresponding to the language type is obtained, and therefore the search is conducted according to the recognition result. In actual use, when determining the language type to which the search term belongs, the determination may be made only from a partial segment of the search term, and the above case will be specifically described with reference to fig. 2.

Fig. 2 is a flowchart of a voice search processing method according to another embodiment of the present invention.

As shown in fig. 2, the method includes:

step 201, obtaining a voice search statement.

The detailed implementation process and principle of step 201 may refer to the detailed description of the above embodiments, which is not repeated herein.

Step 202, according to a preset rule, a segment with a preset length is intercepted from the search statement.

The preset rule is used for specifying a rule for intercepting a segment with a preset length from the search statement.

And 203, identifying the search sentences respectively according to the N language models, and judging the language types of the search sentences according to the fragments with preset lengths.

Wherein, each language model respectively corresponds to a type of language, and N is a positive integer greater than 1.

The preset length can be set arbitrarily as required, as long as the language type to which the search sentence belongs can be judged by using the segment with the preset length. Specifically, the preset length may be set to a fixed length, such as 3 seconds(s), 4s, and so on; the specific setting may also be made according to the length of the search term, etc., such as 1/3 set as the length of the search term, etc., and is not limited herein.

Specifically, the language type to which the search sentence belongs may be determined according to a segment of a preset length through the following steps 203a to 203 b.

Step 203a, determining a feature vector of the search statement according to the segment with the preset length.

Specifically, after the voice search processing device obtains the segment with the preset length, the feature vector of the voice search statement may be determined by a variety of methods, such as a mel cepstrum coefficient, a linear prediction cepstrum coefficient, a multimedia content description interface, and the like.

And step 203b, determining the language type of the search statement according to the matching degree of the feature vector and each preset language type model.

Specifically, each language type model may be obtained by training in advance according to a large number of historical corpora of various types of languages, so that after the feature vector of the voice search statement is determined, the feature vector may be input into each language type model to check and score, and the language type model with the highest score, that is, the language type corresponding to the language type model with the highest matching degree of the feature vector, is determined as the language type to which the search statement belongs.

It should be noted that, the process of identifying the search statement according to the N language models may refer to the detailed description of the foregoing embodiment, and details are not repeated here. In addition, step 202 and step 203 may be performed simultaneously.

And 204, when the search statement is determined to belong to the ith type of language, acquiring a recognition result corresponding to the ith type of language model.

And step 205, searching according to the identification result.

The detailed implementation process and principle of the step 204-205 can refer to the detailed description of the above embodiments, which is not repeated herein.

It can be understood that, when the search statement is identified according to the language model, the complete language needs to be used for identification, and when the language type of the search statement is judged, the fragment with the preset length is intercepted from the search statement for judgment, so that the time consumption of the judgment process is short, and the language type of the search statement can be judged before the identification of the search statement by the language model is finished. Then, in the embodiment of the present invention, after the language type to which the search statement belongs is determined, the recognition process of the language model corresponding to another language type may also be stopped. That is, after determining that the search sentence belongs to the ith type of language in step 204, the method may further include:

the process of recognizing the search sentence according to the other N-1 language models is ended.

Specifically, after the language type to which the search statement belongs is determined, the resource waste can be reduced by stopping the recognition process of the language models corresponding to other language types.

According to the voice search processing method, after the voice search statement is obtained, the fragments with the preset length can be intercepted from the search statement according to the preset rules, then the search statement is identified according to the N language models, and meanwhile the language type of the search statement is judged according to the fragments with the preset length, so that when the search statement is determined to belong to the ith type of language, the identification result corresponding to the ith type of language model can be obtained, and searching can be carried out according to the identification result. Therefore, recognition and search of the voice search sentences are achieved, the efficiency of voice search processing is improved, the waiting time of the user is reduced, and the user experience is improved.

Fig. 3 is a schematic structural diagram of a speech search processing apparatus according to an embodiment of the present invention.

As shown in fig. 3, the voice search processing apparatus includes:

a first obtaining module 31, configured to obtain a voice search statement;

the judging module 32 is configured to identify the search statement and judge the language type to which the search statement belongs according to N language models, where each language model corresponds to one type of language, and N is a positive integer greater than 1;

the second obtaining module 33 is configured to obtain, when it is determined that the search statement belongs to the ith type of language, a recognition result corresponding to the ith type of language model;

and the searching module 34 is used for searching according to the identification result.

Specifically, the voice search processing device provided in this embodiment may be configured in any server with a search function, and configured to execute the voice search processing method described in the above embodiment to search the acquired voice search sentence.

In a possible implementation form of the embodiment of the present application, the determining module 32 includes:

a first determination unit configured to determine a feature vector of a search sentence;

and the second determining unit is used for determining the language type of the search statement according to the matching degree of the feature vector and each preset language type model.

In another possible implementation form of the embodiment of the application, the first determining unit is specifically configured to:

intercepting a segment with a preset length from a search statement according to a preset rule;

and determining the characteristic vector of the search statement according to the segment with the preset length.

It should be noted that the foregoing explanation of the embodiment of the speech search processing method is also applicable to the speech search processing apparatus of this embodiment, and is not repeated here.

As shown in fig. 4, in addition to fig. 3, the speech search processing apparatus further includes:

and a determining module 41, configured to determine the N language models according to the history search log.

And an ending module 42, configured to end the process of recognizing the search statement according to the other N-1 language models.

a memory, a processor and a computer program stored on the memory and executable on the processor, which when executed by the processor implements the speech search processing method as in the previous embodiments.

A fourth aspect embodiment of the present invention proposes a computer-readable storage medium on which a computer program is stored, which when executed by a processor implements a speech search processing method as in the preceding embodiments.

An embodiment of a fifth aspect of the present invention provides a computer program product, wherein when the instructions in the computer program product are executed by a processor, the method for processing a voice search in the foregoing embodiments is performed.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A method of processing a voice search, comprising:

acquiring a voice search statement;

respectively identifying the search sentences according to N language models, and judging the language types of the search sentences according to preset language type models and fragments with preset lengths intercepted from the search sentences, wherein each language model corresponds to one type of language, and N is a positive integer greater than 1;

when the search statement is determined to belong to the ith type of language, acquiring a recognition result corresponding to the ith type of language model, wherein the ith type of language is one of N types of languages corresponding to the N types of language models;

ending the process of identifying the search statement according to other N-1 language models;

and searching according to the identification result.

2. The method of claim 1, wherein prior to identifying the search statement according to the N language models, further comprising:

and determining the N language models according to the historical search logs.

3. The method of claim 1, wherein said determining the language type to which the search statement belongs comprises:

determining a feature vector of the search statement;

and determining the language type to which the search statement belongs according to the matching degree of the feature vector and each preset language type model.

4. The method of claim 3, wherein the determining the feature vector for the search statement comprises:

intercepting a segment with a preset length from the search statement according to a preset rule;

and determining the feature vector of the search statement according to the segment with the preset length.

5. A speech search processing apparatus, comprising:

the first acquisition module is used for acquiring a voice search statement;

the judging module is used for identifying the search sentences according to N language models respectively, and judging the language types of the search sentences according to preset language type models and fragments with preset lengths intercepted from the search sentences, wherein each language model corresponds to one type of language respectively, and N is a positive integer greater than 1;

a second obtaining module, configured to obtain, when it is determined that the search statement belongs to an ith type of language, a recognition result corresponding to the ith type of language model, where the ith type of language is one of N types of languages corresponding to the N types of language models;

the ending module is used for ending the process of identifying the search statement according to other N-1 language models;

and the searching module is used for searching according to the identification result.

6. The apparatus of claim 5, further comprising:

and the determining module is used for determining the N language models according to the historical search logs.

7. The apparatus of claim 5, wherein the determining module comprises:

a first determination unit configured to determine a feature vector of the search sentence;

and the second determining unit is used for determining the language type to which the search statement belongs according to the matching degree of the feature vector and each preset language type model.

8. The apparatus of claim 7, wherein the first determining unit is specifically configured to:

9. A server, comprising:

memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor implements the speech search processing method according to any of claims 1-4 when executing the program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the speech search processing method according to any one of claims 1 to 4.