CN106610972A

CN106610972A - Query rewriting method and apparatus

Info

Publication number: CN106610972A
Application number: CN201510689095.7A
Authority: CN
Inventors: 吴小琼; 吴黎霞
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2015-10-21
Filing date: 2015-10-21
Publication date: 2017-05-03

Abstract

The invention provides a query rewriting method and apparatus. The method can comprise the steps of receiving a search keyword input by a user; selecting an extended word corresponding to the search keyword, wherein the similarity between semantic vectors corresponding to the extended word and the search keyword in a semantic vector space of a preset dimension reaches preset similarity; and rewriting the search keyword to the selected extended word. Through the technical scheme, query rewriting can be realized in combination with semanteme, so that the pushed word coverage rate and the rewriting accuracy can be improved.

Description

Inquiry Improvement and device

Technical field

The application is related to search technique field, more particularly to inquiry Improvement and device.

Background technology

User is required for using function of search under many scenes.When search operation is performed, user can be with Any search keyword is input into, and corresponding Search Results are provided by search engine.

However, the search keyword of user input is often relatively more random, user can not be directly embodied Actual intention, cause Search Results to meet the actual demand of user.

The content of the invention

In view of this, the application provides a kind of inquiry Improvement and device, can look into reference to semantic realization Ask and rewrite, contribute to lifting and push away word coverage rate and rewrite the degree of accuracy.

For achieving the above object, the application offer technical scheme is as follows：

According to the first aspect of the application, it is proposed that one kind inquiry Improvement, including：

The search keyword of receiving user's input；

The expansion word corresponding to the search keyword is chosen, the expansion word exists with the search keyword Similarity in the semantic vector space of default dimension respectively between corresponding semantic vector reaches default similar Degree；

The search keyword is rewritten as into selected expansion word.

According to the second aspect of the application, it is proposed that one kind inquiry re-writing device, including：

Receiving unit, the search keyword of receiving user's input；

Unit is chosen, the expansion word corresponding to the search keyword is chosen, the expansion word is searched with described Similarity of the rope keyword in the semantic vector space of default dimension respectively between corresponding semantic vector reaches To default similarity；

Unit is rewritten, the search keyword is rewritten as into selected expansion word.

From above technical scheme, the application by by search keyword and expansion word be mapped as it is semantic to Semantic vector in quantity space, search keyword can be embodied by the similarity between semantic vector and is expanded Semantic degree of correlation between exhibition word, so as to simplify semantic comparison procedure, improves the standard that inquiry is rewritten Exactness.Meanwhile, by the determination to semantic degree of correlation, eliminate between search keyword and expansion word Text similarity demand, contribute to lifting and push away word coverage rate.

Description of the drawings

Fig. 1 is a kind of flow chart of inquiry Improvement of the exemplary embodiment of the application one；

Fig. 2 is the schematic diagram that a kind of inquiry of the exemplary embodiment of the application one is rewritten；

Fig. 3 is the schematic diagram that another kind of inquiry of the exemplary embodiment of the application one is rewritten；

Fig. 4 is the flow chart of another kind of inquiry Improvement of the exemplary embodiment of the application one；

Fig. 5 is a kind of for realizing the sample training process that inquiry is rewritten of the exemplary embodiment of the application one Flow chart；

Fig. 6 is the schematic diagram that another inquiry of the exemplary embodiment of the application one is rewritten；

Fig. 7 is the structural representation of a kind of electronic equipment of the exemplary embodiment of the application one；

Fig. 8 is a kind of block diagram of inquiry re-writing device of the exemplary embodiment of the application one.

Specific embodiment

As described by background section, due to the search keyword of user input it is more random, it is past Toward its true intention can not be embodied, so as to cause Search Results not meet the actual demand of user.For The technical problem is solved, QR (query rewrite, inquiry is rewritten) is proposed in correlation technique and is processed Means, can be analyzed by the search keyword to user input, and are replaced with automatically and can be embodied The expansion word of the actual intention of user.

In the related, it is proposed that many kinds realize the technological means of QR, mainly include：

(1) based on text similarity.Specifically, by such as TF-IDF (term frequency-inverse Document frequency) etc. mode calculate text similarity between search keyword and expansion word, really Determine the corresponding expansion word of search keyword.But, this mode cannot calculate the search without co-occurrence word and close Similarity (cannot such as determine the similarity of " apple " and iphone between) between keyword and expansion word, And when there is various explanations in same word, it is easy to occur bad (not meeting the actual demand of user) Expansion word (such as " apple fruit basket " and " i Phone ").

(2) based on semantic rules.Specifically, by setting up semantic rules, selection meets semantic rules Expansion word.It should be noted that the foundation of semantic rules can not be obtained really, comparison search is crucial Word and the semanteme of expansion word, are based only on the current understanding of developer and are judged, with great limitation Property, the degree of accuracy and the coverage rate for pushing away word is all very low, and needs the later stage constantly to safeguard that regular, exploitation is new Rule, cost is very high and actual effect is unsatisfactory.

Therefore, the application by improve correlation technique in inquiry rewrite method, to solve correlation technique in The technical problem of presence.It is that the application is further described, there is provided the following example：

Fig. 1 is a kind of flow chart of inquiry Improvement of the exemplary embodiment of the application one, such as Fig. 1 institutes Show, the method may comprise steps of：

Step 102, the search keyword of receiving user's input.

Step 104, chooses the expansion word corresponding to the search keyword, and the expansion word is searched with described Similarity of the rope keyword in the semantic vector space of default dimension respectively between corresponding semantic vector reaches To default similarity.

In the present embodiment, by the way that search keyword and expansion word are respectively mapped to into semantic vector space, Can realize that the actual semanteme between search keyword and expansion word compares, and be not limited in correlation technique only In literal enterprising this similarity-rough set of style of writing, contribute to lifting the degree of accuracy for pushing away word；Simultaneously as being each The actual semanteme of individual word, thus the degree of accuracy of the developer to understanding and the setting of semantic rules is not limited to, And do not need later maintenance.

Step 106, by the search keyword selected expansion word is rewritten as.

From above-described embodiment, the application by search keyword and expansion word by being mapped as semantic vector Semantic vector in space, can be embodied search keyword with extension by the similarity between semantic vector Semantic degree of correlation between word, so as to simplify semantic comparison procedure, improves the accurate of inquiry rewriting Degree.Meanwhile, by the determination to semantic degree of correlation, eliminate between search keyword and expansion word Text similarity demand, contributes to lifting and pushes away word coverage rate.

1st, QR principles

Embodiment as shown in Figure 1 understands, in the technical scheme of the application, the realization of QR processes according to Search keyword and expansion word are each mapped to Lai Yu the semantic vector in semantic vector space, to lead to The contrast of semantic vector is crossed determining the semantic relevancy between search keyword and expansion word.

In order to realize above-mentioned mapping process, as shown in Fig. 2 be able to will be searched for by neural network algorithm Keyword or expansion word are mapped in semantic vector space, to obtain corresponding semantic vector.For example, Such as when the search keyword of user input is " i Phone ", if " i Phone " is mapped to Semantic vector space, then can obtain corresponding semantic vector 1, such as the semantic vector 1 is X；And When there are an alternative words is " iphone6 ", it is assumed that be somebody's turn to do " iphone6 " and map to semantic vector sky Between obtain corresponding semantic vector 2, such as the semantic vector 2 is Y, if then vector X and vector Y Between there is default similarity, then it is assumed that the alternative words " iphone6 " and search keyword " apple hand There is higher semantic relevancy between machine ", thus can be using the alternative words " iphone6 " as searching The corresponding expansion word of rope keyword " i Phone ", thus search keyword " i Phone " is rewritten For " iphone6 ".

Wherein, search keyword or expansion word are being mapped to into semantic vector space and corresponding semanteme is being obtained When vectorial, as an exemplary embodiment, directly search keyword or expansion word can be mapped as into correspondence Semantic vector.And as another exemplary embodiment, as shown in figure 3, this realizes that process can include： By neural network algorithm by all participles for constituting search keyword or expansion word be respectively mapped to it is semantic to Quantity space, obtains corresponding participle vector；Search keyword or expansion word will be constituted according to preset strategy Respectively corresponding participle vector is combined all participles, and using the whole term vector for obtaining as above-mentioned language Adopted vector；By the way that each participle is each mapped to into corresponding participle vector, contribute to reducing processing procedure Complexity.

For example, such as when the search keyword of user input is " i Phone ", by this The word segmentation processing of search keyword, it is assumed that corresponding participle includes that participle 11 is " apple " and participle 12 " mobile phone " etc., then by the way that all participles are respectively mapped to into semantic vector space, respectively obtain corresponding It corresponding to the X1 of participle " apple ", participle vector 32 is corresponding to participle " mobile phone " that participle vector 31 is X2 etc..Similarly, it is assumed that there are alternative words " iphone6 ", participle is carried out to the alternative words Process, obtain corresponding participle including participle 21 " iphone " and participle 22 " 6 " etc., then by inciting somebody to action All participles are respectively mapped to semantic vector space, respectively obtain corresponding participle vector 41 be corresponding to point The Y1 of word " iphone ", participle vector 42 are Y2 corresponding to participle " 6 " etc..

Then, according to preset strategy all participle vectors corresponding to search keyword " i Phone " (i.e. Participle vector 31 " X1 " and participle vector 32 " X2 " etc.) be combined, obtain corresponding whole word to Amount 1, such as the whole term vector 1 are X；Also, according to preset strategy to alternative words " iphone6 " Corresponding all participles vector (i.e. participle vector 41 " Y1 " and participle vector 42 " Y2 " etc.) are carried out Combination, obtains corresponding whole term vector 2, such as the whole term vector 2 is Y.So, search keyword Semantic relevancy analysis between " i Phone " and alternative words " iphone6 ", you can it is right to be converted to Similarity analysis between whole term vector 1 " X " and whole term vector 2 " Y ".

It is apparent that being completely absent between search keyword " i Phone " and word " iphone6 " Literal text similarity, and semantic rules setting difficulty between the two is very big, by related skill During technical scheme in art, it is difficult to realize similar QR process exactly.And in the application, by inciting somebody to action Search keyword and alternative words be each mapped to the whole term vector 1 in semantic vector space (the whole word to Amount 1 can be used as the semantic vector of search keyword) and whole term vector 2 (the whole term vector 2 can be made For the semantic vector of alternative words), will can realize between search keyword and alternative words more difficult Semantic relevancy, be converted to realize it is similar between relatively simple whole term vector 1 and whole term vector 2 Degree compares, it is possible to achieve more accurate, easily QR process operation, determines that search keyword is corresponding Expansion word.

2nd, process is realized based on the QR of sample training

In order that each word can be mapped to correctly in semantic vector space, i.e. the equal energy of each participle The participle vector being correctly mapped as in semantic vector space is reached, and and then is combined as corresponding word correspondingly Whole term vector (the whole term vector can be by the semantic vector as corresponding word), sample can be passed through Training is vectorial to be previously obtained all possible participle corresponding participle in semantic vector space.Press below According to the execution sequence that sample training and QR are processed, the technical scheme of the application is described in detail.

Fig. 4 is the flow chart of another kind of inquiry Improvement of the exemplary embodiment of the application one, such as Fig. 4 Shown, the method may comprise steps of：

Step 402, extracts training sample.

In the case of one kind, the historical behavior of user can to a great extent embody search keyword and expand Semantic relevancy between exhibition word, thus the historical behavior of user can be based on, choose suitable training sample This.For example, training sample can include：The historical search extracted in historical search click logs is closed Keyword and the corresponding history expansion word of clicked business object；Such as when have input search in user's history During keyword " i Phone ", certain business object is clicked in Search Results, such as the business pair As corresponding history expansion word be " iphone6 " when, can be by historical search keyword " i Phone " As sample searches keyword, using history expansion word " iphone6 " as sample expansion word.

In the case of another kind, the data or information related to clicked business object can be based on, be obtained Corresponding training sample.For example, training sample can come from：

1) in historical search click logs extract historical search keyword and from clicked business object Show the prediction expansion word extracted in content；For example, when historical search keyword is " i Phone " When, also include " iphone6P " in the displaying content of clicked business object, and the word " iphone6P " is considered as having higher semantic phase between historical search keyword " i Phone " Guan Du, thus using the word " iphone6P " as prediction expansion word.Wherein it is possible to by historical search Keyword " i Phone " as sample searches keyword, will prediction expansion word " iphone6P " conduct Sample expansion word.

2) the history expansion word for extracting in historical search click logs and the displaying from clicked business object The forecasting search keyword extracted in content；For example, when history expansion word is " iphone6 ", Also include " apple latest version " in the displaying content of clicked business object, and the word " apple Latest version " is considered as having higher semantic relevancy between history expansion word " iphone6 ", thus Using the word " apple latest version " as forecasting search keyword.Wherein it is possible to forecasting search is crucial Word " apple latest version " as sample searches keyword, using history expansion word " iphone6 " as sample Expansion word.

3) the forecasting search keyword for extracting from the displaying content of the clicked business object and prediction Expansion word；For example, when historical search keyword is " i Phone ", clicked business object When corresponding history expansion word is " iphone6 ", if in the displaying content of the clicked business object also Including " apple latest version " and " iphone6P ", and the word " apple latest version " is considered as and word There is higher semantic relevancy between language " iphone6P ", thus by the word " apple latest version " As forecasting search keyword, using the word " iphone6P " as prediction expansion word.Wherein it is possible to Using forecasting search keyword " apple latest version " as sample searches keyword, expansion word will be predicted " iphone6P " is used as sample expansion word.

In still another case, the cognitive and judgement based on user itself, can actively create search crucial Word and corresponding expansion word, and think that there is between the two higher semantic relevancy；Wherein it is possible to point Not using user create search keyword as sample searches keyword, using user create expansion word as Sample expansion word.

Certainly, for three kinds of above-mentioned situations and three kinds of specific implementations in the case of second, can To think specifically to be enumerated five kinds of sources of training sample；Accordingly, can choose therein any one Plant or various implementations, as the source of the training sample in the technical scheme of the application.Or, Can be using part implementation therein as necessary implementation, and another part is used as optional benefit Sufficient mode, such as using above-mentioned the first situation as necessary implementation, and by other four kinds of realization sides Formula is used as optionally supplying mode.

Step 404, training participle vector.

With reference to Fig. 5 and Fig. 6, the training process of the participle vector in the step is described in detail. Wherein, Fig. 5 is a kind of for realizing the sample training mistake that inquiry is rewritten of the exemplary embodiment of the application one The flow chart of journey；Fig. 6 is the schematic diagram that another inquiry of the exemplary embodiment of the application one is rewritten.Such as Shown in Fig. 5, the sample training process may comprise steps of：

Step 502, obtains sample characteristics phrase.

When in the present embodiment, due to extracting training sample in step 402, sample searches keyword with Correspond between sample expansion word, thus will a mutual corresponding sample searches keyword and a sample This expansion word is used as a sample characteristics phrase, and sample searches keyword therein or sample expansion word divide Not as a sample characteristics word in the sample characteristics phrase.

Step 504A, to the sample searches keyword in sample characteristics phrase word segmentation processing is carried out, and is somebody's turn to do All participles of sample searches keyword.

As shown in fig. 6, such as carrying out after word segmentation processing to sample searches keyword, sample point is respectively obtained Word 11 ' and sample participle 12 ' etc.；So, it is assumed that the sample searches keyword is " i Phone ", then Sample participle 11 ' can be able to be " mobile phone " for " apple ", sample participle 12 '.

Step 506A, generates sample participle vector.

In the present embodiment, for above-mentioned sample participle 11 ' and sample participle 12 ' etc., it is right to generate respectively The sample participle vector 31 ' answered and sample participle vector 32 ' etc..For example, it is assumed that sample participle vector 31 ' is X1, sample analysis vector 32 ' is X2, then when semantic vector space is tieed up for n, vectorial X1, Vectorial X2 etc. is n-dimensional vector, such as vector X1={ x₁₁, x₁₂, x₁₃..., x_1n, vectorial X2={ x₂₁, x₂₂, x₂₃..., x_2nEtc..

Wherein, due to subsequently also needing to complete to operate the training of each sample participle vector, thus herein For each concrete numerical value of sample participle vector on every dimension is not required, as long as guaranteeing each Sample participle vector is n dimensions.For example, such as can be raw by way of random initializtion Each sample participle vector of random number is into every dimension, i.e., arbitrary sample participle vector Xi is each Numerical value x in individual dimension_i1、x_i2、……、x_inEtc. being random value.

Step 508A, generates the whole term vector of sample.

In the present embodiment, all participles of sample searches keyword correspond respectively to sample participle vector 31 ' With sample participle vector 32 ' etc., and group is carried out to above-mentioned all sample participle vectors according to preset strategy Close, you can obtain the whole term vector 1 ' of the corresponding sample of sample searches keyword.Wherein, the application is not The preset strategy is limited, as long as the preset strategy has repeatable feasibility, and the sample for generating Whole term vector 1 ' is consistent with the dimension of sample participle vector, such as be above-mentioned n-dimensional vector, you can should For in the technical scheme of the application.

For example, can be by the corresponding all sample participle vectors of sample searches keyword in every dimension On numerical value respectively according to being calculated corresponding to the preset algorithm of above-mentioned preset strategy, obtain the whole word of sample The corresponding numerical value in each dimension of vector 1 '.Wherein, the preset algorithm can be：Average algorithm, plus Weight average algorithm etc., the application is not limited this.

Such as, when sample searches keyword is corresponding to sample participle vector 31 ' and sample participle vector 32 ' When, i.e. vector X1 and vector X2, it is assumed that preset algorithm is average algorithm, then respectively to sample participle to Amount 31 ' and numerical value of the sample participle vector 32 ' in each dimension carry out average computation, obtain corresponding sample This whole term vector 1 ' is X '={ x₁', x₂' ..., x_n', wherein x₁'=(x₁₁+x₂₁)/2、x₂'=(x₁₂+x₂₂) / 2 ... ..., x_n'=(x_1n+x_2n)/2。

It is of course also possible to pass through following manner so that above-mentioned for the generating mode of the whole term vector 1 ' of sample It is easier to operate to：When the semantic vector space is tieed up for n, all of any feature word will be constituted Respectively corresponding n dimensions participle vector constitutes m × n specifications to m participle in the semantic vector space Eigenmatrix；The each column m element in the eigenmatrix is calculated according to preset algorithm respectively, To obtain numerical value of the corresponding whole term vector of any feature word in respective dimensions；By the calculating of each row As a result it is combined as the corresponding n of any feature word and ties up whole term vector.

Such as, when sample searches keyword is corresponding to sample participle vector 31 ' and sample participle vector 32 ', And each sample participle vector is when being 9 dimension, i.e. m=2, n=9, then by sample participle vector 31 ' and sample The eigenmatrix that well-behaved term vector 32 ' is constituted is：

Then, respectively by the individual elements of 2 (m=2) in each column in this feature matrix W x according to pre- imputation Method is calculated, you can obtain the whole term vector 1 ' of sample, i.e. X '={ x₁', x₂' ..., x₉’}。

Wherein, if preset algorithm is average algorithm, x₁'=(x₁₁+x₂₁)/2、x₂'=(x₁₂+x₂₂)/2 ... ..., x₉'=(x₁₉+x₂₉)/2.If preset algorithm is Weighted Average Algorithm, the whole term vector 1 ' of sample can be calculated Numerical value in each dimension is：x₁'=x₁₁×a₁+x₂₁×a₂, x₂'=x₁₂×b₁+x₂₂×b₂... ..., x₉'=x₁₉×i₁+x₂₉×i₂, wherein a₁、a₂Deng the weighted value for being respectively respective element；Wherein, in weighting In average algorithm, in same row the weight of each element can participle corresponding with the element appearance word frequency just Correlation, such as can obtain above-mentioned weighted value according to TF-IDF algorithms, and certain the application is not to this Limited.

With step 504A～step 508A analogously, in step 504B, step 506B and step 508B In, can for the corresponding all participles of sample expansion word (than sample participle 21 ' as shown in Figure 6 and Sample participle 22 ' etc.), corresponding sample participle vector is generated respectively (than sample as shown in Figure 6 point Term vector 41 ' and sample participle vector 42 ' etc.), and according to above-mentioned preset strategy, by all of sample Participle vector is combined as the whole term vector 2 ' of corresponding sample, such as the whole term vector 2 ' of the sample is Y '.

Step 510, training sample.

In the present embodiment, the similarity between the whole term vector 1 ' of sample and the whole term vector 2 ' of sample is calculated, It is assumed that now the similarity is initial similarity Z1.And when obtaining sample characteristics phrase in step 502, The default degree of association is respectively provided between sample searches keyword and sample expansion word in each sample characteristics phrase Z, default degree of association Z embodies the actual semanteme between sample searches keyword and the sample expansion word The degree of correlation.And due to generating respectively in step 506A and step 506B during each sample participle vector, often Numerical value of the individual sample participle vector in each dimension is arbitrary value, thus the whole term vector 1 ' of sample and sample Initial similarity Z1 between whole term vector 2 ' often and does not meet default degree of association Z.

Therefore, it can default degree of association Z as target, by neural network algorithm pair and sample characteristics It is each accordingly that sample searches keyword and sample expansion word in phrase distinguishes the whole term vector of corresponding sample Sample participle vector is trained, i.e., to sample participle vector 31 ', the sample participle vector shown in Fig. 6 32 ', sample participle vector 41 ' and sample participle vector 42 ' etc. are trained, by each sample participle Numerical value change of the vector in each dimension so that the whole term vector 1 ' of corresponding sample and the whole term vector of sample 2 ' numerical value and similarity between the two respectively in each dimension produce corresponding change, from And by the similarity between the whole term vector 1 ' of sample and the whole term vector 2 ' of sample by initial similarity Z1 progressively It is changed to close in default degree of association Z, until being matched with, (equal or difference is less than default Numerical value) default degree of association Z, then it is assumed that training is completed.

Based on above-mentioned principle, when training operation is performed, following loss function can be set up：

Wherein,For training objective, target is above-mentioned default degree of association Z, and output is the whole word of sample Similarity between vector 1 ' and the whole term vector 2 ' of sample, and the initial value of output is above-mentioned initial phase Like degree Z1.

So, each layer hidden variable and active coating ginseng of neutral net is constantly updated by reflecting transmission method Number and term vector, finally cause loss function to minimize, then the whole term vector 1 ' of sample and the whole word of sample to Similarity between amount 2 ' will be matched with default degree of association Z.

Wherein, preset degree of association Z can according to the corresponding hits of corresponding sample characteristics phrase, browse Number, click ratio, browse ratio etc. and obtain, such as when hits/ratio, browse number/ratio it is higher when, The numerical value of corresponding default degree of association Z is bigger, shows corresponding sample searches keyword and sample expansion word Between have higher semantic relevancy.It is of course also possible to determine the default degree of association according to other specification Z, the application is not limited this.

Step 512A, obtains participle vector.

In the present embodiment, as shown in fig. 6, in the whole term vector 1 ' of complete paired samples and the whole term vector of sample After similarity training between 2 ', it is determined that by the corresponding sample participle vector instruction of sample searches keyword It is corresponding participle vector to practice, such as sample participle vector 31 ' is trained to participle vector 31 (in figure not Illustrate), sample participle vector 32 ' be trained to participle vector 32 (not shown)s.Correspondingly, With the whole term vector 2 ' of sample after training, change respectively turns to the whole word shown in Fig. 6 to the whole term vector 1 ' of sample Vector 1 and whole term vector 2.

Step 512B, obtains participle vector.

In the present embodiment, as shown in fig. 6, in the whole term vector 1 ' of complete paired samples and the whole term vector of sample After similarity training between 2 ', it is determined that the corresponding sample participle vector of sample expansion word is trained for Corresponding participle vector, such as sample participle vector 41 ' be trained to participle 41 (not shown)s of vector, Sample participle vector 42 ' is trained to participle 42 (not shown)s of vector.Correspondingly, the whole word of sample With the whole term vector 2 ' of sample after training, change respectively turns to the He of whole term vector 1 shown in Fig. 6 to vector 1 ' Whole term vector 2.

Step 406, combines whole term vector, as the semantic vector of corresponding words.

In the present embodiment, the training sample that step 402 is extracted includes many sample characteristics phrases, respectively Individual sample characteristics phrase is processed by the embodiment shown in above-mentioned Fig. 5, can obtain all samples The word segmentation result set that the corresponding sample participle of Feature Words is constituted, and each sample in the word segmentation result set The corresponding sample participle vector of this participle is trained to corresponding participle vector.

And when combining whole term vector in a step 406, being not only combination, to obtain sample characteristics word corresponding whole Term vector, when the sample participle also in word segmentation result set can be non-sample Feature Words in any combination, The corresponding whole term vector of the non-sample Feature Words is obtained by the vector combination of corresponding participle.Wherein, non-sample Feature Words can be alternative words, such as word of bidding (Bidword) of businessman's purchase etc., or user can The search keyword that can be input into.

For example, it is assumed that sample searches keyword " i Phone " and sample expansion word " iphone6 " Sample participle, the sample participle vector sum participle vector that obtains of training it is as shown in table 1 below, then except by Participle vector P1 and participle vector P2 combinations obtain the corresponding language of sample searches keyword " i Phone " Adopted vector, and combined and obtained sample expansion word " iphone6 " by participle vector Q1 and participle vector Q2 Corresponding semantic vector, can also obtain such as " apple by any combination to each sample participle The corresponding semantic vector such as iphone ".

Table 1

It is to be noted that：

Participle vector is combined obtain whole term vector when, training with step 404 should be adopted " preset strategy " in journey is consistent, i.e., specially in step 508A, step 508B to sample participle to Amount be combined " preset strategy " when obtaining the whole term vector of sample it is consistent, such as to all participles vector Numerical value on same dimension carries out average computation or weighted average calculation etc..

And when the similarity between two vectors is calculated, there are in fact various calculations.Citing and Speech, can directly calculate the similarity of two vectors itself, such as cosine (cosine) distance, Pierre Inferior coefficient correlation etc.；Or, it is also possible to by mapping to neural net layer, the corresponding search of comparison is crucial The degree of association between word and expansion word；Or, other modes, the application can also be adopted not to enter to this Row is limited.

Step 408, generates QR lists.

Step 410, performs QR process.

In the present embodiment, recorded between predefined search keyword and expansion word in QR lists Corresponding relation, each pair search keyword and expansion word described in the corresponding relation is in semantic vector space Respectively the similarity between corresponding semantic vector reaches default similarity.

Therefore, the search keyword for being actually entered according to user, it is only necessary to search simultaneously from the QR lists Extract corresponding word, you can using the word as the corresponding expansion word of the search keyword, and the extension There is higher semantic relevancy between word certainty and search keyword, it is possible to achieve accurately QR process And meet the search need of user.

Certainly, the search keyword of user input may be not present in QR lists, or may be simultaneously Do not set up QR lists in advance, then can be by the way that search keyword be carried out into word segmentation processing, and according to obtaining Participle in above-mentioned word segmentation result set corresponding sample participle, by corresponding point of these sample participles Term vector is combined as the corresponding semantic vector of the search keyword, and further by the search keyword with it is standby The semantic vector for selecting word is compared, and the similarity chosen between semantic vector reaches default similarity Alternative words, as the corresponding expansion word of the search keyword.

Further, in step 410, it can be ensured that the expansion word that QR process is obtained is closed with search Keyword belongs to identical business object classification.Such as when user input " i Phone ", initiative recognition It is " electronic product " to go out the business object classification belonging to the search keyword, and QR is processed as the " electricity The expansion words such as " iphone6 " under sub- product " classification, the rather than " i Phone of " handicraft " classification The expansion words such as model ".Wherein it is possible to pass through the historical behavior data for obtaining user, and according to the history Behavioral data determines the business object classification belonging to search keyword；Such as, the historical behavior data can be with Historical search, historical viewings, history click, history collection, history purchase including the user etc. are various Data.

Fig. 7 shows the schematic configuration diagram of the electronic equipment of the exemplary embodiment according to the application.Please With reference to Fig. 7, in hardware view, the electronic equipment includes processor, internal bus, network interface, interior Deposit and nonvolatile memory, the hardware being also possible that certainly required for other business.Processor from Corresponding computer program is read in nonvolatile memory in internal memory and then is run, on logic level Form inquiry re-writing device.Certainly, in addition to software realization mode, the application is not precluded from other realities Existing mode, such as mode of logical device or software and hardware combining etc., that is to say, that following handling process Executive agent be not limited to each logical block, or hardware or logical device.

Refer to Fig. 8, in Software Implementation, the inquiry re-writing device can include receiving unit, Choose unit and rewrite unit.Wherein：

Receiving unit, the search keyword of receiving user's input；

Optionally, it is described selection unit specifically for：

The corresponding relation between predefined search keyword and expansion word is transferred, is remembered in the corresponding relation The each pair search keyword and expansion word of load in the semantic vector space distinguish corresponding semantic vector it Between similarity reach default similarity；

Obtain the expansion word corresponding with the search keyword described in the corresponding relation.

Optionally, the semantic vector is by corresponding search keyword or extension by neural network algorithm Word maps to the semantic vector space and obtains.

Optionally, search keyword or expansion word are mapped to by the semantic vector space by following manner And obtain the corresponding semantic vector：

The all participles for constituting search keyword or expansion word are respectively mapped to by institute by neural network algorithm Semantic vector space is stated, corresponding participle vector is obtained；According to preset strategy will constitute search keyword or Respectively corresponding participle vector is combined all participles of expansion word, and using the whole term vector for obtaining as The semantic vector.

Optionally, the corresponding participle of the participle vector belongs to all sample characteristics words as training sample Corresponding word segmentation result set, wherein the sample characteristics word is sample searches keyword or sample expansion word, And each sample searches keyword constitutes with each sample expansion word being associated have the default degree of association respectively As eigen phrase；

And, when each participle in the word segmentation result set corresponds respectively to the semantic vector space In each dimension numerical value when being the sample participle vector of arbitrary initial value, by constituting arbitrary sample characteristics word Respectively corresponding sample participle vector is combined as arbitrary sample spy to all participles according to the preset strategy Levy the whole term vector of the corresponding sample of word, and the sample searches keyword and sample in arbitrary sample characteristics phrase There is corresponding initial similarity between the whole term vector of corresponding sample respectively in expansion word；

Wherein, when with the corresponding default degree of association of the arbitrary sample characteristics phrase as target, by described Sample searches keyword and sample expansion word in neural network algorithm pair and arbitrary sample characteristics phrase When respectively the corresponding each sample participle vector of the whole term vector of corresponding sample is trained, if training result So that the initial similarity is changed to is matched with the default degree of association, it is determined that arbitrary sample is special Levy the corresponding all participles of phrase and be mapped to the semantic vector space, and with arbitrary sample characteristics It is each accordingly that sample searches keyword and sample expansion word in phrase distinguishes the whole term vector of corresponding sample Sample participle vector is trained to the corresponding participle vector of corresponding participle.

Optionally, the training sample is from least one of：

The historical search keyword and clicked business object extracted in historical search click logs is corresponding History expansion word；

The historical search keyword and from the displaying content of the clicked business object extract it is pre- Survey expansion word；

The history expansion word and the prediction extracted from the displaying content of the clicked business object are searched Rope keyword；

The forecasting search keyword and prediction extracted from the displaying content of the clicked business object expands Exhibition word；

The expansion word that the search keyword and user that user creates is created；

Wherein, the search that the historical search keyword, the forecasting search keyword and user create is closed By as sample searches keyword, the history expansion word, the prediction expansion word and user create keyword Expansion word by as sample expansion word.

Optionally, the preset strategy includes：

When the semantic vector space is tieed up for n, all m participles of arbitrary word will be constituted respectively in institute State the eigenmatrix of corresponding n dimensions participle vector composition m × n specifications in semantic vector space；

The each column m element in the eigenmatrix is calculated according to preset algorithm respectively, to obtain Numerical value of the corresponding whole term vector of arbitrary word in respective dimensions；

The result of calculation of each row is combined as into n and ties up whole term vector, using as arbitrary word in the semanteme Corresponding semantic vector in vector space.

Optionally, the preset algorithm includes following arbitrary：

Average algorithm；

The appearance word of the weight of each element participle corresponding with the element in Weighted Average Algorithm, and same row Frequency positive correlation.

Optionally, the expansion word belongs to identical business object classification with the search keyword.

Optionally, also include：

Acquiring unit, obtains the historical behavior data of user；

Determining unit, according to the historical behavior data, determines the business pair belonging to the search keyword As classification.

In a typical configuration, computing device include one or more processors (CPU), input/ Output interface, network interface and internal memory.

Internal memory potentially includes the volatile memory in computer-readable medium, random access memory And/or the form, such as read-only storage (ROM) or flash memory (flash such as Nonvolatile memory (RAM) RAM).Internal memory is the example of computer-readable medium.

Computer-readable medium includes that permanent and non-permanent, removable and non-removable media can be by Any method or technique is realizing information Store.Information can be computer-readable instruction, data structure, The module of program or other data.The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), dynamic random access memory (DRAM), Other kinds of random access memory (RAM), read-only storage (ROM), electrically erasable Read-only storage (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc read-only storage (CD-ROM), digital versatile disc (DVD) or other optical storages, magnetic cassette tape, tape magnetic Disk storage or other magnetic storage apparatus or any other non-transmission medium, can be used for storage can be counted The information that calculation equipment is accessed.Define according to herein, computer-readable medium does not include that temporary computer can Read media (transitory media), the such as data-signal and carrier wave of modulation.

Also, it should be noted that term " including ", "comprising" or its any other variant be intended to it is non- Exclusiveness is included, so that a series of process, method, commodity or equipment including key elements is not only Including those key elements, but also including other key elements being not expressly set out, or also include for this The intrinsic key element of process, method, commodity or equipment.In the absence of more restrictions, by language The key element that sentence "including a ..." is limited, it is not excluded that in process, method, business including the key element Also there is other identical element in product or equipment.

The preferred embodiment of the application is the foregoing is only, it is all at this not to limit the application Within the spirit and principle of application, any modification, equivalent substitution and improvements done etc. should be included in Within the scope of the application protection.

Claims

1. it is a kind of to inquire about Improvement, it is characterised in that to include：

The search keyword of receiving user's input；

The search keyword is rewritten as into selected expansion word.

2. method according to claim 1, it is characterised in that the selection is corresponding to the search The expansion word of keyword, including：

3. method according to claim 1, it is characterised in that the semantic vector is by nerve Corresponding search keyword or expansion word are mapped to the semantic vector space and are obtained by network algorithm.

4. method according to claim 3, it is characterised in that will search for crucial by following manner Word or expansion word map to the semantic vector space and obtain the corresponding semantic vector：

5. method according to claim 4, it is characterised in that the corresponding participle of the participle vector Belong to the corresponding word segmentation result set of all sample characteristics words as training sample, wherein the sample is special Word is levied for sample searches keyword or sample expansion word, and each sample searches keyword respectively be associated Each sample expansion word constitute there is eigen phrase as the default degree of association；

6. method according to claim 5, it is characterised in that the training sample from down to It is one of few：

7. method according to claim 4, it is characterised in that the preset strategy includes：

8. method according to claim 7, it is characterised in that the preset algorithm includes following One：

Average algorithm；

9. method according to claim 1, it is characterised in that the expansion word and the search are closed Keyword belongs to identical business object classification.

10. method according to claim 9, it is characterised in that also include：

Obtain the historical behavior data of user；

According to the historical behavior data, the business object classification belonging to the search keyword is determined.

11. a kind of inquiry re-writing devices, it is characterised in that include：

Receiving unit, the search keyword of receiving user's input；

12. devices according to claim 11, it is characterised in that the selection unit specifically for：

13. devices according to claim 11, it is characterised in that the semantic vector is by god Corresponding search keyword or expansion word are mapped to the semantic vector space and are obtained by Jing network algorithms.

14. devices according to claim 13, it is characterised in that will be searched for by following manner and closed Keyword or expansion word map to the semantic vector space and obtain the corresponding semantic vector：

15. devices according to claim 14, it is characterised in that corresponding point of the participle vector Word belongs to the corresponding word segmentation result set of all sample characteristics words as training sample, wherein the sample Feature Words are sample searches keyword or sample expansion word, and each sample searches keyword respectively to it is related Each sample expansion word of connection is constituted has eigen phrase as the default degree of association；

16. devices according to claim 15, it is characterised in that the training sample is from following At least one：

CZ1511101

17. devices according to claim 14, it is characterised in that the preset strategy includes：

18. devices according to claim 17, it is characterised in that the preset algorithm includes following It is arbitrary：

Average algorithm；

19. devices according to claim 11, it is characterised in that the expansion word and the search Keyword belongs to identical business object classification.

20. devices according to claim 19, it is characterised in that also include：

Acquiring unit, obtains the historical behavior data of user；