CN107832476A

CN107832476A - A kind of understanding method of search sequence, device, equipment and storage medium

Info

Publication number: CN107832476A
Application number: CN201711248658.4A
Authority: CN
Inventors: 王硕寰; 孙宇; 于佃海
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2017-12-01
Filing date: 2017-12-01
Publication date: 2018-03-23
Anticipated expiration: 2037-12-01
Also published as: CN107832476B

Abstract

The embodiment of the invention discloses a kind of understanding method of search sequence, device, equipment and storage medium.Methods described includes：It is determined that the term vector of each word included in annotation search sequence；Using advance search sequence is clicked on according to having for each URL site names and each URL site names and without hiding layer parameter, convolution layer parameter and the pond layer parameter clicked in the search sequence CNN models that search sequence trains to obtain as the hiding layer parameter in initial field identification model, convolution layer parameter and pond layer parameter；The field of annotation search sequence marks according to described in, and the term vector of each word included in the sequence of annotation search, the full-mesh layer parameter determined in the initial field identification model is trained to the initial field identification model, to obtain field identification model.The program can improve the model capability and generalization ability in the case of a small amount of sample, optimize training pattern, improve the understanding effect of search sequence.

Description

A kind of understanding method of search sequence, device, equipment and storage medium

Technical field

The present embodiments relate to technical field of information processing, more particularly to a kind of understanding method of search sequence, device, Equipment and storage medium.

Background technology

With the fast development of artificial intelligence (Artificial Intelligence, AI) technology, intelligent customer service, intelligence The increasing product such as assistant, vehicle mounted guidance and smart home and application begin attempt to introduce conversational man-machine interaction side Formula.But the research and development of conversational system are a highly difficult job for most of developers in real work, one of those Technical difficult points are exactly that search sequence (Query) understands.The core missions that Query understands are to convert natural language to machine Accessible Formal Languages, establish natural language and resource and the connection of service.

Query understands to disassemble (judges whether Query belongs to ability for three tasks, i.e. field (Domain) identification Domain, do not parsed if being not belonging to this area), be intended to (Intent) classification and (judge that subdivisions of the Query under the Domain is anticipated Figure) and groove position (Slot) mark (under the Intent, marking the parameter information that concern is needed in Query).Main Basiss sheet at present The mark sample in field, done using the model structure of convolutional neural networks (Convolutional Neural Network, CNN) Domain is identified, utilizes Recognition with Recurrent Neural Network (Recurrent neural Network, RNN) or Recognition with Recurrent Neural Network-bar The model structure of part random field (Recurrent neural Network-Conditional Random Field, RNN-CRF) Do Intent/Slot joint parsing.

But there are the following problems for prior art：1) cost price of labeled data is high, and developer needs to mark largely Data carry out model training, could obtain preferable Query understanding effects.But when labeled data amount is fewer, model effect Fruit receives restriction.2) Query understands that the generalization ability of model is not strong, if new Query is on literal and training set Query is entirely different, possibly can not parse.Such as developer is doing Query understanding services for snacks vending machine, be labelled with " to I carrys out one bottle of cola ", wherein being intended that " purchase ", unit is " one ", and commodity are " cola ".For new Query " Sprite, 2 tanks ", Because each word was not learned, it is difficult to which the intention for judging this Query is also " purchase ".Except non-user is collected and is passed to Proper name dictionary in field, it is kind of a commodity it is difficult to find " Sprite " as " cola ".3) developer is except the language material marked Outside, typically also substantial amounts of un-annotated data, this part language material imply the knowledge of this area and common syntactic structure, but existing Some technologies can not use.4) Query in existing a lot of other fields understands language material at present, and the language material between different field has necessarily Similitude.Current technology can not migrate the mark language material on other field, and the Query for optimizing a uncharted field understands effect Fruit.

The content of the invention

The present invention provides a kind of understanding method of search sequence, device, equipment and storage medium, can improve a small amount of sample In the case of model capability and generalization ability, optimize training pattern, improve Query understanding effects.

In a first aspect, the embodiments of the invention provide a kind of understanding method of search sequence, including：

It is determined that the term vector of each word included in annotation search sequence；

Will be in advance according to each URL (Uniform Resource Locator, URL) site name and each Having for URL site names is clicked on search sequence and trained without search sequence is clicked in obtained search sequence convolutional neural networks model Hiding layer parameter, convolution layer parameter and pond layer parameter as the hiding layer parameter in initial field identification model, convolutional layer Parameter and pond layer parameter；

The field of annotation search sequence marks according to described in, and each word included in the sequence of annotation search Term vector, the initial field identification model is trained and determines that the full-mesh layer in the initial field identification model is joined Number, to obtain field identification model.

Second aspect, the embodiment of the present invention additionally provide a kind of understanding device of search sequence, including：

Term vector determining module, for the term vector for each word for determining to include in annotation search sequence；

Model parameter module, for having click search sequence according to each URL site names and each URL site names by advance With join without clicking on search sequence and train hiding layer parameter, convolution layer parameter and pond layer in obtained search sequence CNN models Number is as the hiding layer parameter in initial field identification model, convolution layer parameter and pond layer parameter；

Field identification model module, for being marked according to the field of the annotation search sequence, and described mark The term vector of each word included in search sequence, the initial field identification model is trained and determines that the initial field is known Full-mesh layer parameter in other model, to obtain field identification model.

The third aspect, the embodiment of the present invention additionally provide a kind of equipment, and the equipment includes：

One or more processors；

Storage device, for storing one or more programs；

When one or more of programs are by one or more of computing devices so that one or more of processing Device realizes the understanding method of search sequence as described above.

Fourth aspect, the embodiment of the present invention additionally provide a kind of computer-readable recording medium, are stored thereon with computer Program, the program realize the understanding method of search sequence as described above when being executed by processor.

The embodiment of the present invention determines CNN field identification model by largely searching for Query and its corresponding click result And the bottom parameter of RNN intention/groove position identification model, then determine with a small amount of labeled data the upper-layer parameters of model. It is advance by the unsupervised data for introducing no annotation results because the bottom parameter in CNN models and RNN models is on a grand scale Bottom parameter is trained, then by there is the data of annotation results to be trained the model parameter on upper strata on a small quantity, so as to by few The labeled data of amount can implementation model training, and the model capability and generalization ability in the case of a small amount of sample can be improved, Optimize training pattern, improve Query understanding effects.

Brief description of the drawings

Fig. 1 is a kind of flow chart of the understanding method of search sequence in the embodiment of the present invention one；

Fig. 1 a are the field identification model schematic diagrames in the embodiment of the present invention one；

Fig. 2 is a kind of flow chart of the understanding method of search sequence in the embodiment of the present invention two；

Fig. 2 a are the field identification model pre-training schematic diagrames in the embodiment of the present invention two；

Fig. 3 is a kind of flow chart of the understanding method of search sequence in the embodiment of the present invention three；

Fig. 3 a are a kind of overall flow schematic diagrams of the understanding method of search sequence in the embodiment of the present invention three；

Fig. 4 is a kind of flow chart of the understanding method of search sequence in the embodiment of the present invention four；

Fig. 4 a are intention/groove position identification model pre-training schematic diagrames in the embodiment of the present invention four；

Fig. 4 b are intention/groove position identification model schematic diagrames in the embodiment of the present invention four；

Fig. 5 is a kind of structural representation for understanding device of search sequence in the embodiment of the present invention five；

Fig. 6 is the structural representation of the equipment in the embodiment of the present invention six.

Embodiment

The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining the present invention, rather than limitation of the invention.It also should be noted that in order to just Part related to the present invention rather than entire infrastructure are illustrate only in description, accompanying drawing.

Embodiment one

Fig. 1 is a kind of flow chart of the understanding method of search sequence in the embodiment of the present invention one, and the present embodiment is applicable In the situation that some specific area search sequence understands, this method can be performed by a kind of understanding device of search sequence, be had Body comprises the following steps：

Step 110, determine the term vector of each word that includes in annotation search sequence.

In the present embodiment, the search sequence for having annotation results that annotation search sequence refers to manually be labeled.Tool Body, for a certain specific area, the field marked content of search sequence can be the title in the field, such as cinematographic field, hand over Logical field etc..

Wherein, term vector can be that one vocabulary is shown as into one by one-hot encoding (one-hot encoding) representation Very long vector, its dimension are vocabulary table sizes, wherein most elements are 0, the value of only one dimension is 1, this dimension Degree just represents current word.And in deep learning, it is general to represent (Distributed using distributed Representation method) represents term vector, and word is phase by this method with a kind of low-dimensional real number vector representation, advantage As word apart from upper closer, the correlation between different words can be embodied, so as to reflect the dependence between word.This reality Example is applied using the distributed method mark term vector represented.

Step 120, by advance click on search sequence and without click according to having for each URL site names and each URL site names Hiding layer parameter, convolution layer parameter and pond layer parameter in the search sequence CNN models that search sequence trains to obtain are used as just Hiding layer parameter, convolution layer parameter and pond layer parameter in beginning field identification model.

By searching for data, the click behavior between Query and URL is recorded, is counted corresponding to all URL can recall Query.If user has searched for a Query, the URL is presented, and user clicks the URL, then is designated as the Query There is click Query (user's search is showed)；User does not click on the URL, then the Query is designated as without click Query.Separately Outside, can also be using other random search sequences in search daily record as without click search sequence.

Wherein, the initial field identification model is established based on CNN models, successively including input layer, hidden layer, volume Lamination, pond layer, layer, full-mesh layer and output layer are temporarily abandoned, and the parameter of hidden layer, convolutional layer and pond layer determines, full-mesh Layer parameter is unknown.

If the specifically, in the present example, it is considered that URL or each URL site names that two different Query are clicked on There is similar place on text, the two Query are probably related.By by each URL site names of other field and Having for each URL site names is clicked on search sequence and trained without search sequence is clicked in CNN models, obtains bottom parameter：Hide Layer parameter, convolution layer parameter and pond layer parameter, and join these bottom parameters as the bottom in initial field identification model Number.Because the underlying model parameter in CNN models is on a grand scale, each word is represented with hundreds of dimensional vectors, if 100,000 words, Underlying model parameter will be more than one hundred million, and the model parameter on upper strata is full-mesh layer parameter, typically only include hundreds of dimensions and are multiplied by hundreds of The matrix of dimension, parameter relative decrease is very big, can lead to too small amount of labeled data study.

Step 130, the field mark of annotation search sequence, and being included in the sequence of annotation search according to described in Each word term vector, determine in the initial field identification model complete is trained to the initial field identification model and being connected Logical layer parameter, to obtain field identification model.

Specifically, referring to Fig. 1 a, the term vector of each word included in the search sequence marked is known as initial field The input of other model, initial field identification model is to the term vector of each word by hidden layer (Hidden Layer), convolutional layer After the bottom layer treatment of (Convolution Layer) and pond layer (Polling Layer), by temporarily abandoning layer (Dropout Layer) convert, that is, randomly choose N number of vector, half is such as selected from 256 dimensional vectors, then by full-mesh layer (Full Connect Layer, FCL) conversion process, the field of the result through full-mesh layer and search sequence mark is compared Compared with, and FCL parameters are adjusted according to comparative result, and until meeting that iterated conditional stops, the parameter of full-mesh layer can be obtained, i.e., it is real The training of field identification model is showed.

The bottom parameter in the identification model of field can be determined by step 120, can be determined by step 130 in model The parameter of full-mesh layer, that is, all parameters in the identification model of field are determined, have obtained field identification model, can be to search Sequence carries out field identification.

The present embodiment is by largely searching for Query and its corresponding click result, it is determined that the field identification model based on CNN Bottom parameter, then determine with a small amount of labeled data the full-mesh layer parameter of field identification model.Due to the bottom of CNN models Parameter is on a grand scale, by introducing the good bottom parameter of unsupervised data training in advance of no annotation results, then by having on a small quantity The data of annotation results are trained to the model parameter on upper strata, can implementation model instruction so as to lead to too small amount of labeled data Practice, and the model capability and generalization ability in the case of a small amount of sample can be improved, optimize training pattern, improve Query and understand Effect.

Embodiment two

Fig. 2 is a kind of flow chart of the understanding method of search sequence in the embodiment of the present invention two.The present embodiment is above-mentioned On the basis of embodiment, the understanding method of above-mentioned search sequence is further optimized.Accordingly, as shown in Fig. 2 the present embodiment Method specifically includes：

Step 210, determine the term vector of each word that includes in annotation search sequence.

Step 220, each URL site names of acquisition and having for each URL site names click on search sequence and search for sequence without clicking on Row.

Wherein, the URL site names are the combinations that server name in URL adds domain name, such as：If URL is：http:// Flights.ctrip.com/fuzzy/#ctm_ref=ctr_nav_flt_fz_pgs, then this URL server name be Flights, domain name ctrip.com, the entitled flights.ctrip.com of website, or, it can also use Flights.ctrip.com page title is as site name.

Specifically, all URL of traversal, get each URL site names and each URL site names has click search sequence With without click on search sequence.

There is the term vector for clicking on each word included in search sequence described in step 230, determination, it is described without click search sequence In the term vector of each word that includes, and the term vector of each word included in the URL site names.

In the present embodiment, the specific mistake of the term vector of each word included in search sequence or the URL site name is determined Journey can be：The search sequence or the URL site names are segmented, obtain the search sequence or the URL websites Each word included in name；Word, part of speech, name entity are carried out to each word included in the search sequence or the URL site names Identification obtains the term vector of each word included in search sequence or URL site names.The present embodiment is by merging word, part of speech, name The features such as entity determine term vector.

Step 240, there is the term vector determination for clicking on each word included in search sequence according to described in using the first CNN models There is click locating vector, determined using the first CNN models according to described without the term vector for clicking on each word included in search sequence Without locating vector is clicked on, site name is determined using term vector of the 2nd CNN models according to each word included in the URL site names Vector.

Specifically, referring to Fig. 2 a, have and click on search sequence QueryA and can share the without search sequence QueryB is clicked on One CNN models are trained, and have been respectively obtained and have been clicked on locating vector and without locating vector is clicked on, the URL site names are using another The 2nd outer CNN models are trained, and obtain site name vector.

Step 250, foundation have the first similarity clicked between locating vector and site name vector, and are searched for without clicking on The second similarity between vector and site name vector, is optimized to the first CNN models and the 2nd CNN models, Using the first CNN models after optimization as the search sequence CNN models.

Specifically, referring to Fig. 2 a, calculate site name vector respectively with have click on locating vector and without click on locating vector it Between similarity, obtain the first similarity Similar_Score (QueryA, URL) and the second similarity Similar_Score (QueryB,URL).Then (Loss) function is lost by backpropagation (Back Propagation, BP) algorithmic minimizing to come The first CNN models and the 2nd CNN models are optimized, using the first CNN models after optimization as the search Sequence C NN models.

Wherein, the Loss functions can be expressed as：

Wherein, Similar (V_{There is point Q},V_T) it is the first similarity, Similar (V_{Without point Q},V_T) it is the second similarity, margin For constant.

Step 260, by advance click on search sequence and without click according to having for each URL site names and each URL site names Hiding layer parameter, convolution layer parameter and pond layer parameter in the search sequence CNN models that search sequence trains to obtain are used as just Hiding layer parameter, convolution layer parameter and pond layer parameter in beginning field identification model.

Specifically, hiding layer parameter, convolution layer parameter and the pond of the search sequence CNN models that step 250 is determined Change layer parameter as the hiding layer parameter in initial field identification model, convolution layer parameter and pond layer parameter, that is, determine just The bottom parameter of beginning field identification model.

Step 270, the field mark of annotation search sequence, and being included in the sequence of annotation search according to described in Each word term vector, determine in the initial field identification model complete is trained to the initial field identification model and being connected Logical layer parameter, to obtain field identification model.

Specifically, it can be identified using the bottom parameter of the initial field identification model in migration step 250 as field Bottom parameter in model.According to the field mark of annotation search sequence, the word of each word included in annotation search sequence The term vector of each word included in vector, and non-annotation search sequence is virtually supervised to the initial field identification model Training, first pass through and temporarily abandon layer conversion, then the parameter of full-mesh layer can be determined after the conversion of full-mesh layer.Thus, obtain The bottom parameter of field identification model and the parameter of full-mesh layer, that is, obtained field identification model, search sequence can have been entered Row field identifies.

Embodiment three

Fig. 3 is a kind of flow chart of the understanding method of search sequence in the embodiment of the present invention three.The present embodiment is above-mentioned On the basis of embodiment, field identification, intention assessment and the identification of groove position in the understanding method of above-mentioned search sequence are illustrated Model determine.Accordingly, the method for the present embodiment specifically includes：

Step 310, determine the term vector of each word that includes in annotation search sequence.

Step 320, by advance click on search sequence and without click according to having for each URL site names and each URL site names Hiding layer parameter, convolution layer parameter and pond layer parameter in the search sequence CNN models that search sequence trains to obtain are used as just Hiding layer parameter, convolution layer parameter and pond layer parameter in beginning field identification model.

Step 321, the field mark of annotation search sequence, and being included in the sequence of annotation search according to described in Each word term vector, determine in the initial field identification model complete is trained to the initial field identification model and being connected Logical layer parameter, to obtain field identification model.

Step 322, the field mark according to annotation search sequence, the word of each word included in annotation search sequence to Amount, and the term vector of each word included in non-annotation search sequence carry out virtual supervision instruction to the initial field identification model Practice the full-mesh layer parameter determined in the initial field identification model, to obtain field identification model.

Step 330, using the advance hiding layer parameter trained according to search sequence in obtained two-way RNN language models as Hiding layer parameter in initial intention assessment model and initial slot position identification model.

Step 331, the intention mark of annotation search sequence is instructed to the initial intention assessment model according to described in Practice the full-mesh layer parameter determined in the initial intention assessment model, to obtain intention assessment model；Or according to described in The groove position mark of annotation search sequence is trained to initial slot position identification model determines initial slot position identification model In full-mesh layer parameter and condition random field layer parameter, to obtain groove position identification model.

Step 332, the intention mark according to annotation search sequence, the word of each word included in annotation search sequence to Amount, and the term vector of each word included in non-annotation search sequence carry out virtual supervision instruction to the initial intention assessment model Practice the full-mesh layer parameter determined in the initial intention assessment model, to obtain intention assessment model.

Step 333, the groove position mark according to annotation search sequence, the word of each word included in annotation search sequence to Amount, and the term vector of each word included in non-annotation search sequence carry out virtual supervision instruction to initial slot position identification model Practice the full-mesh layer parameter and condition random field layer parameter determined in the initial slot position, to obtain groove position identification model.

It should be noted that step 320 and step 330 are parallel, can also without sequencing.

In the present embodiment, if there is the non-annotation search sequence of this area, virtual dual training can preferably be used Technology (Virtual Adversarial Training) introduces unsupervised data and the data marked together, carries out half and supervises The training superintended and directed；For there is annotation search sequence that the virtual dual training technology can also be used to carry out semi-supervised training.Referring to step Rapid 322, step 332 and step 333 respectively obtain field identification model, intention assessment model using the virtual dual training technology With groove position identification model.For the vertical class data without annotation results, to the probability distribution in its field, intention and groove position, difference Determine to disturb recognition result in maximum direction with loss function is minimized, the identification of the recognition result after disturbance and original sample Result difference is small as far as possible.Wherein, minimizing loss function can be expressed as：

Wherein,S is represented Sample, d represent perturbation direction, and p is to be intended to or the probability distribution of groove position, KL are its KL divergence, r_v-advFor the change of KL divergences most Big direction, passes through derivativeN ' is obtained to have mark sample and nothing Mark the summation of sample.

Fig. 3 a are a kind of overall flow schematic diagrams of the understanding method of search sequence in the embodiment of the present invention three.In system By largely searching for Query data and its corresponding click behavior outcome in integrated, obtain clicking on Query and without click Query term vector, and CNN multiple features disaggregated models can be obtained according to the term vector, then according to have labeled data and Above-mentioned CNN multiple features disaggregated model can obtain CNN Domain models.Meanwhile it can also be instructed according to the Query data of search Two-way-Recognition with Recurrent Neural Network (Bi-Recurrent neural Network, Bi-RNN) multiple features language model is got, so Basis has labeled data and above-mentioned Bi-RNN multiple features language model to obtain two-way-Recognition with Recurrent Neural Network-condition random afterwards The Intent/ of field (Bi-Recurrent neural Network-Conditional Random Field, Bi-RNN-CRF) Slot models.If whether there is the data of mark, virtual dual training technology can be used to introduce unsupervised data and mark The data being poured in together, carry out semi-supervised training, obtain CNN Domain models and Bi-RNN-CRF Intent/Slot Model.The Domain models of the CNN and Bi-RNN-CRF Intent/Slot models are for users to use.

The embodiment of the present invention is by largely searching for Query and its corresponding click result, it is determined that the field identification based on CNN The bottom parameter of model and intention/groove position identification model based on two-way RNN language models, then it is true with a small amount of labeled data Determine the upper-layer parameters of field identification model and intention/groove position identification model；And for non-annotation search sequence and there is mark to search Suo Xulie can use virtual dual training technology to carry out semi-supervised training.Due to introducing the unsupervised number of no annotation results According to the good bottom parameter of training in advance, then by there is the data of annotation results to be trained the model parameter on upper strata on a small quantity, so as to Lead to too small amount of labeled data can implementation model training, the model capability in the case of a small amount of sample and extensive energy can be improved Power；Influence of the minute differences to result of feature can be reduced using virtual dual training technology, increases flatness, optimization training Model, improve Query understanding effects.

Example IV

Fig. 4 is a kind of flow chart of the understanding method of search sequence in the embodiment of the present invention four.The present embodiment is above-mentioned On the basis of embodiment, the model for further optimizing intention assessment in the understanding method of above-mentioned search sequence and the identification of groove position is true It is fixed.Accordingly, the method for the present embodiment specifically includes：

Step 410, determine the term vector of each word that includes in annotation search sequence.

Step 420, determine the term vector of each word that is included in search sequence.

In the present embodiment, determining the detailed process of the term vector for each word that search sequence includes can be：Searched to described Suo Xulie is segmented, and obtains each word included in the search sequence；Each word for being included in the search sequence is carried out word, Part of speech, name Entity recognition obtain the term vector of each word included in search sequence.The present embodiment is by merging word, part of speech, life The features such as name entity determine term vector.

Step 430, the input using the term vector of each word included in search sequence as two-way RNN language models, pass through The next word of forward direction Recognition with Recurrent Neural Network prediction in the two-way RNN language models, recycled back neural network prediction are previous Individual word, and according to prediction result adjust the forward direction Recognition with Recurrent Neural Network in the two-way RNN language models hiding layer parameter and The hiding layer parameter of recycled back neutral net.

Specifically, referring to Fig. 4 a, using the term vector of each word included in search sequence as the defeated of two-way RNN language models Enter, handled by embeding layer (Embedding Layer), then pass through the forward direction in two-way RNN language models layer (RNN Layer) The next word of Recognition with Recurrent Neural Network prediction, the previous word of recycled back neural network prediction, and according to described in prediction result adjustment The hiding layer parameter of forward direction Recognition with Recurrent Neural Network in two-way RNN language models and the hidden layer ginseng of recycled back neutral net Number.By can be with after the hiding layer parameter of the forward direction Recognition with Recurrent Neural Network and the splicing of the hiding layer parameter of recycled back neutral net Obtain the hiding layer parameter in two-way RNN language models.Wherein, the two-way RNN language models can also be entered by BP algorithm Row optimization.

Step 440, using the advance hiding layer parameter trained according to search sequence in obtained two-way RNN language models as Hiding layer parameter in initial intention assessment model and initial slot position identification model.

Wherein, the initial intention assessment model include input layer, hidden layer, word expression layer, temporarily abandon layer, sequence represents Layer, full-mesh layer and output layer, wherein sequence expression layer are used to represent to be spliced to obtain sequence by each word for temporarily abandoning layer output Overall expression, initial slot position identification model includes input layer, hidden layer, word expression layer, temporarily abandons layer, full-mesh layer, bar Part random field layer and output layer, and hidden layer, word expression layer and the parameter determination for temporarily abandoning layer, full-mesh layer and condition random field layer Unknown parameters.

Specifically, by training search sequence in two-way RNN language models in step 430, bottom parameter is obtained：It is hidden Layer parameter is hidden, and using the bottom parameter as the bottom parameter in initial intention assessment model and initial slot position identification model.

Step 450, the intention mark of annotation search sequence is instructed to the initial intention assessment model according to described in Practice the full-mesh layer parameter determined in the initial intention assessment model, to obtain intention assessment model；Or according to described in The groove position mark of annotation search sequence is trained to initial slot position identification model determines initial slot position identification model In full-mesh layer parameter and condition random field layer parameter, to obtain groove position identification model.

Specifically, referring to Fig. 4 b, the input using the intention mark of annotation search sequence as initial intention assessment model, Initial intention assessment model is to being intended to mark by hidden layer, word expression layer and temporarily abandoning the bottom layer treatment of layer and then by sequence Row expression layer, the conversion of full-mesh layer and the processing of Softmax classification functions, it may be determined that the parameter of full-mesh layer, that is, realize The training of intention assessment model.

Or referring to Fig. 4 b, the groove position of annotation search sequence is marked as initial slot position identification model Input, initial slot position identification model is to groove position mark by hidden layer, expression layer and temporarily abandoning at the bottom of layer After reason, by condition random field layer (Conditional Random Field layer, CRF Layer), the beginning probability (a) of groove position label is modeled, transition probability (w) and terminal probabilities (b) are for an annotation results Obtain CRF parameters；Become again by full-mesh layer Change, it may be determined that the parameter of full-mesh layer, that is, realize the training of groove position identification model.

The embodiment of the present invention is by largely searching for Query and its corresponding click result, it is determined that being based on two-way RNN language mould The bottom parameter of the intention of type/groove position identification model, then determine field identification model and intention/groove position with a small amount of labeled data The upper-layer parameters of identification model.Because the bottom parameter of RNN models is on a grand scale, by introducing the unsupervised of no annotation results The good bottom parameter of data training in advance, then by there is the data of annotation results to be trained the model parameter on upper strata on a small quantity, from And lead to too small amount of labeled data can implementation model training, and the model capability in the case of a small amount of sample and general can be improved Change ability, optimize training pattern, improve Query understanding effects.

Embodiment five

Fig. 5 is a kind of structural representation for understanding device of search sequence in the embodiment of the present invention five, and described device can With including：

Term vector determining module 510, for the term vector for each word for determining to include in annotation search sequence；

Model parameter module 520, for clicking on search according to each URL site names and having for each URL site names by advance Sequence and train hiding layer parameter, convolution layer parameter and pond in obtained search sequence CNN models without search sequence is clicked on Layer parameter is as the hiding layer parameter in initial field identification model, convolution layer parameter and pond layer parameter；

Field identification model module 530, for being marked according to the field of the annotation search sequence, and described mark The term vector of each word included in note search sequence, the initial field identification model is trained and determines the initial field Full-mesh layer parameter in identification model, to obtain field identification model.

Exemplary, the device can also include CNN model modules, be specifically used for：

Obtain each URL site names and having for each URL site names clicks on search sequence and without click search sequence；

It is determined that described have the term vector for clicking on each word included in search sequence, included in the search sequence without click The term vector of each word included in the term vector of each word, and the URL site names；

There is the term vector for clicking on each word included in search sequence to determine there is click to search according to described in using the first CNN models Suo Xiangliang, determine to search without click without the term vector for clicking on each word included in search sequence according to described using the first CNN models Suo Xiangliang, site name vector is determined according to the term vector of each word included in the URL site names using the 2nd CNN models；

According to the first similarity having between click locating vector and site name vector, and without click locating vector and station The second similarity between roll-call vector, is optimized to the first CNN models and the 2nd CNN models, after optimization The first CNN models as the search sequence CNN models.

Exemplary, the device can also include intention/groove position identification model module, be specifically used for：

It is determined that after the term vector of each word included in annotation search sequence, trained advance according to search sequence To two-way RNN language models in hiding layer parameter as hidden in initial intention assessment model and initial slot position identification model Hide layer parameter；

The intention mark of annotation search sequence is trained to the initial intention assessment model and determines institute according to described in The full-mesh layer parameter in initial intention assessment model is stated, to obtain intention assessment model；Or the annotation search according to described in The groove position mark of sequence is trained to initial slot position identification model determines connecting entirely in the identification model of the initial slot position Logical layer parameter and condition random field layer parameter, to obtain groove position identification model.

Further, the device can also include two-way RNN language models parameter module, be specifically used for：

Determine the term vector of each word included in search sequence；

Input using the term vector of each word included in search sequence as two-way RNN language models, by described two-way The next word of forward direction Recognition with Recurrent Neural Network prediction in RNN language models, the previous word of recycled back neural network prediction, and according to It is predicted that result adjusts hiding layer parameter and the recycled back god of the forward direction Recognition with Recurrent Neural Network in the two-way RNN language models Hiding layer parameter through network.

Exemplary, the device can also include term vector module, be specifically used for：

The search sequence or the URL site names are segmented, obtain the search sequence or the URL site names In each word for including；

Word, part of speech, name Entity recognition is carried out to each word included in the search sequence or the URL site names to obtain The term vector of each word included in search sequence or URL site names.

Exemplary, the field identification model module specifically can be used for：

Field according to annotation search sequence marks, the term vector of each word included in annotation search sequence, and The term vector of each word included in non-annotation search sequence carries out virtual supervised training determination to the initial field identification model Full-mesh layer parameter in the initial field identification model, to obtain field identification model.

Exemplary, the intention assessment model module specifically can be used for：

Intention according to annotation search sequence marks, the term vector of each word included in annotation search sequence, and The term vector of each word included in non-annotation search sequence carries out virtual supervised training determination to the initial intention assessment model Full-mesh layer parameter in the initial intention assessment model, to obtain intention assessment model.

Exemplary, the groove position identification model module specifically can be used for：

Groove position according to annotation search sequence marks, the term vector of each word included in annotation search sequence, and The term vector of each word included in non-annotation search sequence carries out virtual supervised training determination to initial slot position identification model Full-mesh layer parameter and condition random field layer parameter in the initial slot position, to obtain groove position identification model.

A kind of understanding device for search sequence that the embodiment of the present invention is provided can perform any embodiment of the present invention and be carried The understanding method of the search sequence of confession, possess the corresponding functional module of execution method and beneficial effect.

Embodiment six

Fig. 6 is the structural representation of the equipment in the embodiment of the present invention six.Fig. 6 is shown suitable for being used for realizing that the present invention is real Apply the block diagram of the example devices 612 of mode.The equipment 612 that Fig. 6 is shown is only an example, should not be to the embodiment of the present invention Function and use range bring any restrictions.

As shown in fig. 6, equipment 612 is showed in the form of universal computing device.The component of equipment 612 can include but unlimited In：One or more processor 616, system storage 628, connection different system component (including system storage 628 and place Manage device 616) bus 618.

Bus 618 represents the one or more in a few class bus structures, including memory bus or Memory Controller, Peripheral bus, graphics acceleration port, processor 616 or total using the local of any bus structures in a variety of bus structures Line.For example, these architectures include but is not limited to industry standard architecture (ISA) bus, MCA (MAC) bus, enhanced isa bus, VESA's (VESA) local bus and periphery component interconnection (PCI) are total Line.

Equipment 612 typically comprises various computing systems computer-readable recording medium.These media can be it is any can be by equipment 612 usable mediums accessed, including volatibility and non-volatile media, moveable and immovable medium.

System storage 628 can include the computer system readable media of form of volatile memory, such as deposit at random Access to memory (RAM) 630 and/or cache memory 632.Equipment 612 may further include other removable/not removable Dynamic, volatile/non-volatile computer system storage medium.Only as an example, storage system 634 can be used for read-write can not Mobile, non-volatile magnetic media (Fig. 6 do not show, commonly referred to as " hard disk drive ").Although not shown in Fig. 6, Ke Yiti For the disc driver for being read and write to may move non-volatile magnetic disk (such as " floppy disk "), and to may move non-volatile light The CD drive of disk (such as CD-ROM, DVD-ROM or other optical mediums) read-write.In these cases, each driver It can be connected by one or more data media interfaces with bus 618.Memory 628 can include at least one program and produce Product, the program product have one group of (for example, at least one) program module, and these program modules are configured to perform of the invention each The function of embodiment.

Program/utility 640 with one group of (at least one) program module 642, can be stored in such as memory In 628, such program module 642 includes but is not limited to operating system, one or more application program, other program modules And routine data, the realization of network environment may be included in each or certain combination in these examples.Program module 642 Generally perform the function and/or method in embodiment described in the invention.

Equipment 612 can also be logical with one or more external equipments 614 (such as keyboard, sensing equipment, display 624 etc.) Letter, can also enable a user to the equipment communication interacted with the equipment 612 with one or more, and/or with causing the equipment 612 Any equipment (such as network interface card, the modem etc.) communication that can be communicated with one or more of the other computing device.This Kind communication can be carried out by input/output (I/O) interface 622.Also, equipment 612 can also by network adapter 620 with One or more network (such as LAN (LAN), wide area network (WAN) and/or public network, such as internet) communication.Such as Shown in figure, network adapter 620 is communicated by bus 618 with other modules of equipment 612.It should be understood that although do not show in figure Go out, other hardware and/or software module can be used with bonding apparatus 612, included but is not limited to：It is microcode, device driver, superfluous Remaining processing unit, external disk drive array, RAID system, tape drive and data backup storage system etc..

Processor 616 is stored in program in system storage 628 by operation, so as to perform various function application and Data processing, such as the understanding method for the search sequence that the embodiment of the present invention is provided is realized, this method includes：

By advance search sequence is clicked on according to each URL site names and having for each URL site names and without click search sequence Hiding layer parameter, convolution layer parameter and pond layer parameter in obtained search sequence CNN models is trained to know as initial field Hiding layer parameter, convolution layer parameter and pond layer parameter in other model；

Embodiment seven

The embodiment of the present invention seven additionally provides a kind of computer-readable recording medium, is stored thereon with computer program, should The understanding method of the search sequence provided such as the embodiment of the present invention is realized when program is executed by processor, this method includes：

The computer-readable storage medium of the embodiment of the present invention, any of one or more computer-readable media can be used Combination.Computer-readable medium can be computer-readable signal media or computer-readable recording medium.It is computer-readable Storage medium for example may be-but not limited to-the system of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, device or Device, or any combination above.The more specifically example (non exhaustive list) of computer-readable recording medium includes：Tool There are the electrical connections of one or more wires, portable computer diskette, hard disk, random access memory (RAM), read-only storage (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only storage (CD- ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.In this document, computer-readable storage Medium can be any includes or the tangible medium of storage program, the program can be commanded execution system, device or device Using or it is in connection.

Computer-readable signal media can include in a base band or as carrier wave a part propagation data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can Any computer-readable medium beyond storage medium is read, the computer-readable medium, which can send, propagates or transmit, to be used for By instruction execution system, device either device use or program in connection.

The program code included on computer-readable medium can be transmitted with any appropriate medium, including --- but it is unlimited In wireless, electric wire, optical cable, RF etc., or above-mentioned any appropriate combination.

It can be write with one or more programming languages or its combination for performing the computer that operates of the present invention Program code, described program design language include object oriented program language-such as Java, Smalltalk, C++, Also include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with Fully perform, partly perform on the user computer on the user computer, the software kit independent as one performs, portion Divide and partly perform or performed completely on remote computer or server on the remote computer on the user computer. Be related in the situation of remote computer, remote computer can pass through the network of any kind --- including LAN (LAN) or Wide area network (WAN)-be connected to subscriber computer, or, it may be connected to outer computer (such as carried using Internet service Pass through Internet connection for business).

Pay attention to, above are only presently preferred embodiments of the present invention and institute's application technology principle.It will be appreciated by those skilled in the art that The invention is not restricted to specific embodiment described here, can carry out for a person skilled in the art various obvious changes, Readjust and substitute without departing from protection scope of the present invention.Therefore, although being carried out by above example to the present invention It is described in further detail, but the present invention is not limited only to above example, without departing from the inventive concept, also Other more equivalent embodiments can be included, and the scope of the present invention is determined by scope of the appended claims.

Claims

A kind of 1. understanding method of search sequence, it is characterised in that including：

It is determined that the term vector of each word included in annotation search sequence；

By advance search sequence is clicked on according to each URL site names and having for each URL site names and without click search sequence training Hiding layer parameter, convolution layer parameter and pond layer parameter in obtained search sequence CNN models identify mould as initial field Hiding layer parameter, convolution layer parameter and pond layer parameter in type；

Field according to the annotation search sequence marks, and each word included in the sequence of annotation search word to Amount, the full-mesh layer parameter determined in the initial field identification model is trained to the initial field identification model, with Obtain field identification model.
2. according to the method for claim 1, it is characterised in that described according to each URL site names and each URL site names Have and click on search sequence and include without clicking on search sequence and training to obtain search sequence CNN models：

Obtain each URL site names and having for each URL site names clicks on search sequence and without click search sequence；

It is determined that described have the term vector for clicking on each word included in search sequence, each word included in the search sequence without click Term vector, and the term vector of each word included in the URL site names；

Had using the first CNN models according to described in click on the term vector of each word included in search sequence determine to have click search to Amount, using the first CNN models according to it is described without click on the term vector of each word that is included in search sequence determine without click on search to Amount, site name vector is determined according to the term vector of each word included in the URL site names using the 2nd CNN models；

According to the first similarity having between click locating vector and site name vector, and without click locating vector and site name The second similarity between vector, the first CNN models and the 2nd CNN models are optimized, by after optimization One CNN models are as the search sequence CNN models.
3. according to the method for claim 1, it is characterised in that each word for determining to include in annotation search sequence After term vector, in addition to：

Using the hiding layer parameter in the advance two-way RNN language models for training to obtain according to search sequence as initial intention assessment Hiding layer parameter in model and initial slot position identification model；

Intention mark according to the annotation search sequence the initial intention assessment model is trained determine it is described first Full-mesh layer parameter in beginning intention assessment model, to obtain intention assessment model；Or the annotation search sequence according to described in Groove position mark to initial slot position identification model be trained determine initial slot position identification model in full-mesh layer Parameter and condition random field layer parameter, to obtain groove position identification model.
4. according to the method for claim 3, it is characterised in that the two-way RNN languages for training to obtain according to search sequence Speech model includes：

Determine the term vector of each word included in search sequence；

Input using the term vector of each word included in search sequence as two-way RNN language models, pass through the two-way RNN languages The next word of forward direction Recognition with Recurrent Neural Network prediction in speech model, the previous word of recycled back neural network prediction, and according to pre- Survey hiding layer parameter and recycled back nerve net that result adjusts the forward direction Recognition with Recurrent Neural Network in the two-way RNN language models The hiding layer parameter of network.
5. according to the method described in claim 2 or claim 4, it is characterised in that the determination search sequence or URL websites The term vector of each word included in name, including：

The search sequence or the URL site names are segmented, obtain wrapping in the search sequence or the URL site names Each word contained；

Word, part of speech, name Entity recognition is carried out to each word included in the search sequence or the URL site names to be searched for The term vector of each word included in sequence or URL site names.
6. according to the method for claim 1, it is characterised in that the field of annotation search sequence is marked described in the foundation Note, and the term vector of each word included in the sequence of annotation search, the initial field identification model is trained The full-mesh layer parameter in the initial field identification model is determined, to obtain field identification model, including：

According to the field mark of annotation search sequence, the term vector of each word included in annotation search sequence, and do not mark The term vector of each word included in note search sequence is carried out described in virtual supervised training determination to the initial field identification model Full-mesh layer parameter in the identification model of initial field, to obtain field identification model.
7. according to the method for claim 4, it is characterised in that the intention of annotation search sequence marks described in the foundation The full-mesh layer parameter determined in the initial intention assessment model is trained to the initial intention assessment model, to obtain Intention assessment model, including：

According to the intention mark of annotation search sequence, the term vector of each word included in annotation search sequence, and do not mark The term vector of each word included in note search sequence is carried out described in virtual supervised training determination to the initial intention assessment model Full-mesh layer parameter in initial intention assessment model, to obtain intention assessment model.
8. according to the method for claim 4, it is characterised in that the groove position of annotation search sequence marks described in the foundation The full-mesh layer parameter and condition determined in the identification model of the initial slot position is trained to initial slot position identification model Random field layer parameter, to obtain groove position identification model, including：

According to the groove position mark of annotation search sequence, the term vector of each word included in annotation search sequence, and do not mark The term vector of each word included in note search sequence is carried out described in virtual supervised training determination to initial slot position identification model Full-mesh layer parameter and condition random field layer parameter in initial slot position, to obtain groove position identification model.
A kind of 9. understanding device of search sequence, it is characterised in that including：

Term vector determining module, for the term vector for each word for determining to include in annotation search sequence；

Model parameter module, for clicking on search sequence and nothing according to each URL site names and having for each URL site names by advance Hiding layer parameter, convolution layer parameter and the pond layer parameter clicked in the search sequence CNN models that search sequence trains to obtain are made For the hiding layer parameter in initial field identification model, convolution layer parameter and pond layer parameter；

Field identification model module, for the field mark of annotation search sequence, and the annotation search according to described in The term vector of each word included in sequence, the initial field identification model is trained and determines the initial field identification mould Full-mesh layer parameter in type, to obtain field identification model.
10. device according to claim 9, it is characterised in that also including CNN model modules, be specifically used for：

Obtain each URL site names and having for each URL site names clicks on search sequence and without click search sequence；

It is determined that described have the term vector for clicking on each word included in search sequence, each word included in the search sequence without click Term vector, and the term vector of each word included in the URL site names；

Had using the first CNN models according to described in click on the term vector of each word included in search sequence determine to have click search to Amount, using the first CNN models according to it is described without click on the term vector of each word that is included in search sequence determine without click on search to Amount, site name vector is determined according to the term vector of each word included in the URL site names using the 2nd CNN models；

According to the first similarity having between click locating vector and site name vector, and without click locating vector and site name The second similarity between vector, the first CNN models and the 2nd CNN models are optimized, by after optimization One CNN models are as the search sequence CNN models.
11. device according to claim 9, it is characterised in that specific to use also including intention/groove position identification model module In：

It is determined that after the term vector of each word included in annotation search sequence, train what is obtained according to search sequence by advance Hiding layer parameter in two-way RNN language models is as the hidden layer in initial intention assessment model and initial slot position identification model Parameter；

Intention mark according to the annotation search sequence the initial intention assessment model is trained determine it is described first Full-mesh layer parameter in beginning intention assessment model, to obtain intention assessment model；Or the annotation search sequence according to described in Groove position mark to initial slot position identification model be trained determine initial slot position identification model in full-mesh layer Parameter and condition random field layer parameter, to obtain groove position identification model.
12. device according to claim 11, it is characterised in that also including two-way RNN language models parameter module, specifically For：

Determine the term vector of each word included in search sequence；

Input using the term vector of each word included in search sequence as two-way RNN language models, pass through the two-way RNN languages The next word of forward direction Recognition with Recurrent Neural Network prediction in speech model, the previous word of recycled back neural network prediction, and according to pre- Survey hiding layer parameter and recycled back nerve net that result adjusts the forward direction Recognition with Recurrent Neural Network in the two-way RNN language models The hiding layer parameter of network.
13. according to the device described in claim 10 or claim 12, it is characterised in that also including term vector module, specifically For：

The search sequence or the URL site names are segmented, obtain wrapping in the search sequence or the URL site names Each word contained；

Word, part of speech, name Entity recognition is carried out to each word included in the search sequence or the URL site names to be searched for The term vector of each word included in sequence or URL site names.
14. device according to claim 9, it is characterised in that the field identification model module is specifically used for：

According to the field mark of annotation search sequence, the term vector of each word included in annotation search sequence, and do not mark The term vector of each word included in note search sequence is carried out described in virtual supervised training determination to the initial field identification model Full-mesh layer parameter in the identification model of initial field, to obtain field identification model.
15. a kind of equipment, it is characterised in that the equipment includes：

One or more processors；

Storage device, for storing one or more programs；

When one or more of programs are by one or more of computing devices so that one or more of processors are real The now understanding method of the search sequence as described in any in claim 1-8.
16. a kind of computer-readable recording medium, is stored thereon with computer program, it is characterised in that the program is by processor The understanding method of the search sequence as described in any in claim 1-8 is realized during execution.