CN109857845A - Model training and data retrieval method, device, terminal and computer readable storage medium - Google Patents
Model training and data retrieval method, device, terminal and computer readable storage medium Download PDFInfo
- Publication number
- CN109857845A CN109857845A CN201910005290.1A CN201910005290A CN109857845A CN 109857845 A CN109857845 A CN 109857845A CN 201910005290 A CN201910005290 A CN 201910005290A CN 109857845 A CN109857845 A CN 109857845A
- Authority
- CN
- China
- Prior art keywords
- training
- model
- query
- rewriting
- rewritten
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a kind of model training and data retrieval method, device, terminal and computer readable storage medium, which includes: to obtain the first training set, and the first training set includes the first former inquiry and the first rewritten query for matching same queries result;Pre-training is carried out to model is generated to rewriting according to the first training set;Obtain the second training set, second training set includes multiple first positive samples and multiple first negative samples, first positive sample includes the second former inquiry and the second rewritten query for matching same queries result, and the first negative sample includes the inquiry of third original and the second rewritten query for matching different query results;Pre-training is carried out to discrimination model to rewriting according to the second training set;According to the method for dual training, dual training is carried out to two models Jing Guo pre-training.Rewriting of the present invention after dual training can generate best rewritten query text, can promote the accuracy rate of data search to model is generated to the user query text of input.
Description
Technical field
The present invention relates to field of computer technology, more particularly to a kind of model training and data retrieval method, device, end
End and computer readable storage medium.
Background technique
Currently, when user inquires content on a search engine, the query statement of user's input have diversity, ambiguousness and
It is random.For example, user's input " draw Crayon Shinchan director whom is " is inquired, wherein one word " drawing " of user's multi input;
For another example user's input " secret service imperial concubine's broadcast time " is inquired, wherein includes the other of " Chu Qiaochuan " in the text of input
Name " secret service imperial concubine ", these similar user query sentences, will lead to query statement can not be converted to structured query language, difficult
Accurately to hit the content of inquiry needed for user.
Therefore, it is necessary to be written over to the natural language querying of user's input, the former query statement weight that user is inputted
It is written as semantic accurate query statement.
Summary of the invention
The present invention provides a kind of model training and data retrieval method, device, terminal and computer readable storage medium,
To solve the problems, such as that the former query statement of the inaccuracy to user's input in the related technology is difficult to carry out the accurate search of data.
To solve the above-mentioned problems, according to an aspect of the present invention, the invention discloses a kind of model training method, packets
It includes:
The first training set is obtained, first training set includes the first former inquiry and the first weight for matching same queries result
Write inquiry;
Pre-training is carried out to model is generated to rewriting according to first training set, by the rewriting of the pre-training
The user query text generation rewritten query text to input is used for generation model;
Obtain the second training set, second training set includes multiple first positive samples and multiple first negative samples, described
First positive sample includes the second former inquiry for matching same queries result and the second rewritten query, first negative sample include
The inquiry of third original and second rewritten query with different query results;
Pre-training is carried out to discrimination model to rewriting according to second training set, by the rewriting of the pre-training
Discrimination model is used to judge the rewritten query text to the user query text and the rewritten query text of input
Whether it is the best rewritten query of the user query text and exports judging result;
Institute according to the method for dual training, to the rewriting Jing Guo pre-training to model is generated and Jing Guo pre-training
It states rewriting and dual training is carried out to discrimination model, the rewriting after dual training is used to appoint input to model is generated
It anticipates the best rewritten query text of a user query text generation.
According to another aspect of the present invention, the invention also discloses a kind of model training apparatus, comprising:
First obtains module, and for obtaining the first training set, first training set includes matching same queries result
First former inquiry and the first rewritten query;
First pre-training module, for, to rewriteeing to model progress pre-training is generated, being passed through according to first training set
The rewriting of the pre-training is used for the user query text generation rewritten query text to input to generation model;
Second obtains module, and for obtaining the second training set, second training set includes multiple first positive samples and more
A first negative sample, first positive sample include the second former inquiry and the second rewritten query for matching same queries result, institute
Stating the first negative sample includes the third original inquiry for matching different query results and second rewritten query;
Second pre-training module, for, to rewriteeing to discrimination model progress pre-training, being passed through according to second training set
The rewriting of the pre-training is used for the user query text and the rewritten query text to input to discrimination model,
Judge whether the rewritten query text is the best rewritten query of the user query text and exports judging result;
Dual training module, for the method according to dual training, to the rewriting Jing Guo pre-training to generation model
And the rewriting Jing Guo pre-training carries out dual training to discrimination model, the rewriting after dual training is to generation
Model is used for the best rewritten query text of any one user query text generation to input.
According to another aspect of the invention, the invention also discloses a kind of data retrieval methods, comprising:
Receive user query text;
The user query text input is rewritten to trained in advance to model is generated, best rewritten query is obtained
Text;
According to the best rewritten query text, default knowledge icon is retrieved, search result is obtained;
Wherein, described to rewrite to model is generated for being written over to any one user query text of input, it generates
Best rewritten query text.
According to another aspect of the invention, the invention also discloses a kind of data searchers, comprising:
Receiving module, for receiving user query text;
Input module is obtained for rewriteeing the user query text input to trained in advance to model is generated
To best rewritten query text;
Retrieval module, for retrieving default knowledge icon, obtaining search result according to the best rewritten query text;
Wherein, described to rewrite to model is generated for being written over to any one user query text of input, it generates
Best rewritten query text.
According to another aspect of the invention, the invention also discloses a kind of terminals, comprising: memory, processor and storage
On the memory and the model training program that can run on the processor, the model training program is by the processing
The step of model training method as described in above-mentioned any one is realized when device executes.
In accordance with a further aspect of the present invention, the invention also discloses a kind of computer readable storage medium, the computers
It is stored with model training program on readable storage medium storing program for executing, is realized when the model training program is executed by processor as above-mentioned any
Step in model training method described in one.
In accordance with a further aspect of the present invention, the invention also discloses a kind of terminals, comprising: memory, processor and storage
On the memory and the data retrieving program that can run on the processor, the data retrieving program is by the processing
It realizes when device executes such as the step of above-mentioned data retrieval method.
In accordance with a further aspect of the present invention, the invention also discloses a kind of computer readable storage medium, the computers
Data retrieving program is stored on readable storage medium storing program for executing, the data retrieving program realizes above-mentioned data when being executed by processor
Step in search method.
Compared with prior art, the present invention includes the following advantages:
In this way, the embodiment of the present invention by rewrite to generate model and rewrite pre-training is carried out respectively to discrimination model,
And dual training is carried out to two models after pre-training, it being capable of basis during dual training to generation model so that rewriteeing
Automatic Iterative update is carried out from the judging result rewritten to discrimination model, so that the rewriting after dual training is to generation model
Best rewritten query text can be generated to any one user query text of input, then using the rewriting to generation mould
The rewritten query text that type is rewritten carries out data search, so that it may promote the accuracy rate of data search.
Detailed description of the invention
Fig. 1 is a kind of step flow chart of model training method embodiment of the invention;
Fig. 2 is the step flow chart of another model training method embodiment of the invention;
Fig. 3 is a kind of structural schematic diagram of syntax tree embodiment of the invention;
Fig. 4 is a kind of partial schematic diagram of knowledge mapping of the invention;
Fig. 5 is the step flow chart of another model training method embodiment of the invention;
Fig. 6 is a kind of step flow chart of data retrieval method embodiment of the invention;
Fig. 7 is a kind of structural block diagram of model training apparatus embodiment of the invention;
Fig. 8 is a kind of structural block diagram of data searcher embodiment of the invention.
Specific embodiment
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, with reference to the accompanying drawing and specific real
Applying mode, the present invention is described in further detail.
The embodiment of the invention provides a kind of model training method, rewriting after this method training to generating model,
Semantic accurately best rewritten query text can be rewritten as, so as to benefit to any one user query text of input
Data search is carried out with the best rewritten query text after rewriting, and it is quasi- to promote the hit to the query result of user query sentence
True rate.
When the above-mentioned rewriting of training is to model is generated, need to carry out dual training by means of rewriteeing to discrimination model, from
And the rewriting after dual training is enabled to generate best rewritten query to user query of the generation model to input.
And before carrying out dual training, the method for the embodiment of the present invention is needed to rewriteeing to generating model and rewrite to sentencing
Other model carries out the pre-training based on intensified learning respectively.
Wherein, when carrying out the pre-training based on intensified learning to above-mentioned two model, it is referred to following technical principle:
Search engine is considered as intelligent body (agent), user is considered as environment (Environment), the original of note user's input
Inquiring (query) is m, and preceding k-1 moment generated new lexical item is { y1 ... ... yk-1 }, and generated lexical item is constituted weight
Write query or composition part and rewrite query, here at the time of can be understood as step, then the state (state) at kth moment can
To be expressed as (m, { y1 ... ..., yk-1 }).Therefore, query rewrite problem can be converted to sequential decision problem, agent exists
The movement (action) that executes under kth moment state can be best until generating to generate a new lexical item yk for original inquiry m
Until rewriteeing query (being made of the multiple lexical items generated).
The selection of rewrite strategy θ of the search engine when being inquired each time can be seen as a trial and error, for same
One original query, at a rewrite strategy θ, search engine can export a rewriting query;And it rewrites to discrimination model
Can feedback and KnowledgeBase-query result based on user quality, to differentiate whether rewriting query is the best of former query
Rewrite query.Search engine can be with the judging result that the rewriting provides discrimination model as the award obtained from environment
(reward), during interaction trial and error, search engine will gradually learn to optimal query rewrite strategy θ, that is, maximize accumulative
Award.So that the former query that search engine can input user, exports best rewriting query.
The method of the embodiment of the present invention can distinguish two models, i.e. model 1 and model 2 when carrying out model training
Carry out pre-training.The present invention for the pre-training step of two models execution sequence with no restrictions.
Wherein, model 1 is to rewrite to model is generated, and the former query (such as m) for inputting for user generates corresponding heavy
Write query.After the completion of the training of model 1, when being predicted using model 1, input parameter is m, and output result is to rewrite query.
Model 2 is to rewrite to discrimination model, predicts that rewriting query is with query is rewritten for the former query to input
The no best rewriting query prefix substring for original query.Wherein, the definition for most preferably rewriteeing query is to rewrite query and original
Query semanteme is close, and can more accurately express user's intention, and the accuracy rate and recall rate of search result can be improved after rewriting.
During dual training, the output valve of model 2 can be used to help the tune of model 1 as ancillary input signals of the model 1 when trained
Excellent parameter rationally generates by word and rewrites lexical item.
After the training of model 2 is completed and (refers to pre-training and dual training is completed), when being predicted using model 2, input ginseng
Number is original query and rewrites query, and output result is 0 or 1.Such as: former query are as follows: " thinkling sound's Ya list is who leads " rewrites
Query are as follows: " thinkling sound's Ya list director ", it is 1 that model 2, which exports result, former query are as follows: " whom thinkling sound's Ya list is ", rewriting query are " Lang Ya
List director ", it is 0 that model 2, which exports result, indicates to rewrite the best rewriting query that query is not former query.
The step flow chart of the model training method of one embodiment of the invention is shown referring to Fig.1, and this method has packet
Include following steps:
Step 101, the first training set is obtained, first training set includes multiple second positive samples.
Wherein, the first training set here is training sample when to rewriteeing to generation model progress pre-training, here
Training data only include positive sample, do not need building negative sample.And it rewrites here in order to distinguish to generation model and rewriting pair
The positive sample used respectively when both discrimination models pre-training, here by 1 pre-training of model when, the positive sample used is named as
Second positive sample, when by 2 pre-training of model, the positive sample used is named as the first positive sample.
Wherein, second positive sample includes the first former inquiry and the first rewritten query for matching same queries result;
In one example, when obtaining the first training set, the corresponding same inquiry knot can be extracted from daily record data
User's read statement of fruit constitutes one group of positive sample, i.e. one group of candidate rewrites pair.
Such as: user input query text be " whom the child of king X is? ", " what is your name by the child of king X ", system
The query result of return is all " sinus XX, Lee X ".So the two user query texts may be constructed one group of candidate's rewriting pair, i.e.,
Constitute a pair of of positive sample.And which query text is labeled as labeled as former inquiry, which query text in the two query texts
Rewritten query, the present invention is to this and with no restrictions.
Further, it is also possible to which the method based on data enhancing converts the user query text extracted from daily record data
(for example, by using redundancy lexical item is increased, remove stop words, upset word order, the methods of synonym replacement is carried out based on synonymicon)
To generate more candidate rewritings pair.
Such as first training set include positive sample 1 (original inquiry 1, rewritten query 1), positive sample 2 (original inquiry 2, rewritten query
2), positive sample 3 (original inquiry 1, rewritten query 3) ... positive sample n (original inquiry n, rewritten query m).
Wherein, former inquiry 2, rewritten query 2 and rewritten query 3 are all by converting to original inquiry 1 and rewritten query 1
Obtained from query statement.
That is, in the first training set, between different positive samples, their corresponding query results can it is identical or
It is different.
Step 102, pre-training is carried out to model is generated to rewriting according to first training set;
Wherein, the user query text generation weight to input is used for generation model by the rewriting of the pre-training
Write query text;
Wherein, rewriteeing to model is generated is neural network model, for its specific network structure, can with flexible configuration,
In one example, which is sequence to sequence (sequence to sequence, seq2seq) model to model is generated, should
The basic framework of model uses coder-decoder framework.
In one embodiment, can be made using bidirectional circulating neural network LSTM (Long Short-Term Memory)
For encoder, using the LSTM based on attention mechanism as decoder.In other embodiments, it can also use acyclic
The sequence of neural network framework is to series model, including the sequence to sequence model based on convolution
(convolutional sequence to sequence model), or it is based on bull attention mechanism (Transformer frame
Structure) sequence to sequence model.
In pre-training, the second positive sample in the first training set that step 101 can be obtained is input to seq2seq
Model carries out pre-training to seq2seq model, wherein can be trained using Adam optimization algorithm, training objective is most
Bigization possibility predication.
It should be noted that since the model 1 is seq2seq model, when obtaining the first training set, Mei Ge
Two positive samples be all include two query statements, i.e., each positive sample is that the candidate of one group of corresponding same queries result rewrites
To (including former query and rewrite query, such as " whom the child of king X is? " " what is your name by the child of king X "), so as to
Two query statements in a sample to be separately input in two branch networks of seq2seq model.
Step 103, the second training set is obtained, second training set includes multiple first positive samples and multiple first negative samples
This;
Wherein, the second training set obtained here is rewritten for training to discrimination model, is used due to rewriteeing discrimination model
It is whether semantic close in judging the two to two query statements of input, and user's intention can be more accurately expressed, therefore, here
The second training set include positive sample and negative sample.
Wherein, first positive sample includes the second former inquiry and the second rewritten query for matching same queries result, institute
Stating the first negative sample includes the third original inquiry for matching different query results and second rewritten query.
That is, the first positive sample includes two query statements, the corresponding query result of two query statements is identical, and
It also include two query statements in first negative sample, but the corresponding query result of the two query statements is different, still, same
Between one group of positive negative sample, a query statement having the same, i.e., above-mentioned second rewritten query.
And when obtaining the first positive sample in the second training set, any one user is obtained in user journal data to be looked into
It askes sentence (such as query1), i.e. the second rewritten query, and in user journal data, excavation is searched with the user query sentence
Another identical query2 of hitch fruit;Error correction is carried out to the query1, obtains query3, is obtained small with query1 editing distance
In the query4 of pre-determined distance threshold value;Excavation can so be obtained to query2, query3, query4 and be used as above-mentioned query1
Positive example sample (be second former inquiry), to construct three groups of positive samples, positive sample form be (the second former inquiry, second
Rewritten query), it is specifically respectively (query2, query1), (query3, query1), (query4, query1);
Further, it is also possible to randomly select different its of search result corresponding from query1 in above-mentioned user journal data
His query as the query1 negative example sample (being the inquiry of third original), to constitute multiple groups first with query1 respectively
Negative sample.
It wherein, is former inquiry in order to which which is distinguished in positive sample and negative sample and each sample in the second training set,
Which is rewritten query.Each sample may have labeled data, and the labeling form of each sample is < original in the second training set
Query, rewrite query, rewrite query as the best rewritten query of original query probability value >.
Wherein, the probability value marked in negative sample is 0, and the probability value marked in positive sample is 1.
Such as: to query " thinkling sound's Ya list director " is rewritten, can excavate corresponding former query includes: " thinkling sound's Ya list
Lead ", " thinkling sound's tooth list director ", " thinkling sound's Ya list is who leads ", so as to constitute three positive samples, positive sample 1 < thinkling sound Ya list is led-is marked
The second rewriting query is directed-marked to note the second original query, thinkling sound's Ya list, 1>, positive sample 2<thinkling sound tooth list director, thinkling sound's Ya list director, 1
>, positive sample 3<thinkling sound Ya list is who leads, thinkling sound's Ya list director, 1>;The corresponding negative sample of above-mentioned rewriting query " thinkling sound's Ya list director "
Can include but is not limited to: negative sample 1<prolong auspiciousness strategy protagonist, thinkling sound's Ya list is directed, 0>;Negative sample 2 < such as virtuous hero of biography is drilled, thinkling sound's Ya list
Director, 0 >.
Wherein, the labeled data for only having made the second former inquiry and the second rewritten query to positive sample 1 in the example above is retouched
It states, the labeled data of other samples is similar, is not described here.
Due to being in user input query sentence, the former inquiry of input be it is various, therefore, carrying out model 2
Diversified former inquiry is constructed when pre-training, in training sample.
Step 104, pre-training is carried out to discrimination model to rewriting according to second training set;
Wherein, by the rewriting of the pre-training discrimination model is used for the user query text of input and
The rewritten query text judges whether the rewritten query text is the best rewritten query of the user query text and defeated
Judging result out;
Wherein, the rewritten query text is the user query text generation rewritten to model is generated to input
And the rewritten query text exported;
Wherein, rewriteeing is neural network model to discrimination model, for its specific network structure, can with flexible configuration,
In one example, rewriteeing can be using GBDT (gradient promotion decision tree, Gradient Boosting to discrimination model
Decision Tree) model, it is also possible to other neural network models in other embodiments certainly, which is not described herein again.
In pre-training, any one positive sample or negative sample in the second training set of step 103 acquisition can use, come
Pre-training is carried out to the GBDT model, enables and the user of input is looked by the GBDT model of the pre-training
Ask text and the rewritten query text, judge the rewritten query text whether be the user query text best rewriting
It inquires and exports judging result.
Wherein, the seq2seq model after pre-training can export rewritten query text to the user query text of input
This.Then, the method for the embodiment of the present invention can use the GBDT model after pre-training, to the user query text and
The rewritten query text differentiated, differentiate the rewritten query text whether be the user query text best rewritten query,
And export 0 or 1 judging result, wherein when output result be 0, indicate that the rewritten query text is not the user query text
Best rewritten query;It is 1 when exporting result, indicates that the rewritten query text is that the best rewriting of the user query text is looked into
It askes.
Step 105, according to the method for dual training, to the rewriting Jing Guo pre-training to generation model and by pre-
The trained rewriting carries out dual training to discrimination model;
Wherein, the rewriting after dual training is literary to any one user query of model for input are generated
The best rewritten query text of this generation.
When carrying out above-mentioned dual training, can use described in the judging result guidance for rewriteeing and being exported to discrimination model
The training to model is generated is rewritten, so that the rewriting after dual training can appoint input to model is generated
It anticipates the best rewritten query text of a user query text generation.
Wherein, individually the rewriting after pre-training is quasi- to the result that model and rewriting respectively export discrimination model is generated
True rate is lower, therefore, in embodiments of the present invention, is being instructed in advance to model and rewriting is generated to discrimination model to rewriting respectively
After white silk, rewriting after the method training simultaneously of dual training can also be used to complete pre-training is to generating model and rewrite to sentencing
Other model.Since the method robustness of dual training is stronger, it can be adapted for the case where training data is unevenly distributed weighing apparatus, such as:
Positive sample is very few, and negative sample is excessive.In being normally applied scene, the sample data that when training pattern collects is due to time and scale
Limitation, not necessarily can reflect the distribution situation of truthful data, and by the method for dual training, model can be simulated preferably
The distribution of truthful data simultaneously therefrom learns, and two models that training obtains can be handled more accurately when carrying out on-line prediction
True data.
Wherein, it during dual training, rewrites to discrimination model to user query text output rewritten query text
Afterwards, whether rewrite both can carry out the user query text and rewritten query text being sentencing of most preferably rewriteeing to discrimination model
Not, and it will differentiate reward of the result as rewriting to model is generated, to instruct the rewriting to carry out next round training to model is generated,
So as to instruct to rewrite, to generating, model generation is best to rewrite query.
In this way, the embodiment of the present invention by rewrite to generate model and rewrite pre-training is carried out respectively to discrimination model,
And dual training is carried out to two models after pre-training, it being capable of basis during dual training to generation model so that rewriteeing
Automatic Iterative update is carried out from the judging result rewritten to discrimination model, so that the rewriting after dual training is to generation model
Best rewritten query text can be generated to any one user query text of input, then using the rewriting to generation mould
The rewritten query text that type is rewritten carries out data search, so that it may promote the accuracy rate of data search.
Optionally, in one embodiment, when executing step 104, that is, utilizing each sample in the second training set
Originally and labeled data possessed by each sample is when to rewriteeing to discrimination model progress pre-training, can be using shown in Fig. 2
Method realize:
Wherein it is possible to extract three classes characteristic using each sample in the second training set and its labeled data, i.e.,
Then the data that S201~S203 is obtained respectively using S204 by above-mentioned three classes data, are input to rewriting and carry out to discrimination model
Pre-training.Wherein, above-mentioned three classes data can mark belonging to sample corresponding probability value (such as the probability value of positive sample mark is
1,0) probability value of negative sample mark is.
S201 obtains the mould described in second training set between the second rewritten query and preset map query pattern collection
Formula matching degree;
Optionally, trained sample (the second positive sample or the second negative sample are ready to use in for any one in the second training set
This), when executing S201, it is possible, firstly, to obtain the semanteme of second rewritten query;Then, it is based on context-free grammar
Semantic reduction, generative grammar tree are carried out to the semanteme;Finally, the mode for again concentrating the syntax tree and preset map query pattern
Tree is matched, using the node ratio that is matched to as the mode between the second rewritten query and preset map query pattern collection
With degree.
The embodiment of the present invention by using the matched node ratio based on syntax tree method, to calculate pattern match degree,
Syntax tree after semantic reduction can preferably portray the semantic meaning representation of query, use the pattern match degree of knowledge based map
As feature, can more effectively determine whether rewritten query is the former best rewritten query inquired, towards knowledge based figure
It is then more applicable when the search scene of spectrum.
Wherein, when obtaining semantic, semantic parsing can be carried out to the second rewritten query in the sample, obtains the second weight
Write the semanteme of inquiry;
Wherein, preset map query pattern collection can be customized one group of syntax set;Syntax tree is sentence structure
Graphical representation represents sentence according to the derivation result of given grammar rule;Semantic reduction is to be based on having pre-defined by sentence
One group of context-free grammar, gradually mode configuration is converted.
For example, for example, the second rewritten query here is " daughter of king X ", semantic parsing is carried out to it first, is obtained
It is " daughter of king X " to semanteme;Then, semantic reduction is carried out to the semanteme based on context-free grammar, generated shown in Fig. 3
Oriented syntax tree, wherein NR indicates that name, REL indicate relationship, and NRP indicates character relation;Again by oriented grammer shown in Fig. 3
The scheme-tree concentrated with preset map query pattern is set to be matched.For example, preset map query pattern concentration be matched to as
Mutually isostructural associative mode tree shown in Fig. 3, it is determined that be in the node ratio that preset map query pattern concentration is matched to
1.So the pattern match degree of the second rewritten query " daughter of king X " and preset map query pattern collection is 1.
Wherein, the element in syntax tree includes but is not limited to name (NR), TV play (CHANNEL), drilled (V_
ACTOR), role (ACTOR), relationship (REL), acute name (ALBUM), attribute (PROPERTY), VRP (role playing), NRP (people
Object relationship) etc..
Wherein, semantic reduction, generative grammar tree are being carried out to the semantic of the second rewritten query based on context-free grammar
When, can by being segmented to the second rewritten query, then to it is each participle (as " king X ", " ", " daughter ") progress entity
Identification (as " king X " corresponding entity be NR, " daughter " corresponding entity be REL, " " do not have entity);Then, to identification
To entity carry out the entity qi that disappears and (such as after " apple " are identified as " fruit " entity, based on context " fruit " entity disappear
Except ambiguity, it is modified to " company name ");Then, based on mode node mapping dictionary, (wherein, the configuration of mode node mapping dictionary is real
The mapping relations of body type and mode node type, such as: entity type are as follows: person, mode node type are NR) carry out mould
Formula node label is then matched by mode node mapping dictionary for example, some corresponding entity of participle " king X " is NR, can be right
Participle " king X " the dimension model node is NR;Then, then to the mode node of each participle carry out reduction (such as " daughter of king X "
The reduction result of corresponding mode node is NR- > NRP, REL- > NRP), therefore, available " daughter of king X " is corresponding to be had
It is syntax tree shown in Fig. 3 to syntax tree.
In addition, being matched to when being matched with the scheme-tree that preset map query pattern is concentrated syntax tree in calculating
Node ratio when, be referred to following scheme:
For example, the second rewritten query be " king's honor XX ", then its corresponding syntax tree be GAME, by by the syntax tree with
The scheme-tree that preset map query pattern is concentrated is matched, and concentrates the scheme-tree being matched to exist in preset map query pattern
2 nodes (GAME- > GAME_COMMENTATOR), then an oriented syntax tree GAME only corresponding node with the mode,
Therefore, integrate the node ratio being matched in preset map query pattern as 1/2=0.5, i.e., pattern match degree is 0.5.
Wherein, node proportion computing technology are as follows: total node number of the scheme-tree of hit node number/be matched to, here
Scheme-tree includes " GAME " and " GAME_COMMENTATOR " two nodes, and the corresponding syntax tree of the second rewritten query is
" GAME " has therefore only hit a node, so 1/2=0.5.
S202 retrieves default knowledge mapping, obtains search result, obtain the retrieval according to second rewritten query
As a result quantity;
Wherein, the default knowledge mapping may include the relationship between the entity of multiple types and different type entity,
Wherein, each entity has title and attribute.
Wherein it is possible to which the second rewritten query to be resolved to the first subgraph of structuring, retrieving in default knowledge mapping is
No the second subgraph having with first subgraph match, the number of the second subgraph is the quantity of search result, if do not matched
Subgraph, then the quantity of search result be 0.
Specifically, if the second rewritten query is some entity, if retrieve and whether there is in default knowledge mapping
Entity of the same name, by the number of the entity of the same name retrieved be identified as search result quantity (such as the second rewritten query be " performer
Make pottery XX ", then the entity " performer's pottery XX " of two duplications of name can be retrieved in knowledge mapping, therefore the quantity of search result is
2);For another example, the second rewritten query is the relationship of some entity, then retrieves the corresponding son of the entity relationship in default knowledge mapping
Node of graph, the quantity of the subgraph node are the quantity of search result;For another example, the second rewritten query is some entity attributes, then
The corresponding subgraph node of the entity attribute is retrieved in default knowledge mapping, the quantity of the subgraph node is the number of search result
Amount.
In one example, as shown in figure 4, showing the partial schematic diagram of default knowledge mapping, wherein Property table
Show that attribute, Relation indicate relationship.For example, the second rewritten query is " spouse of Deng XX ", then the second rewritten query is expressed
It is " relationship of Deng performer XX ", it therefore, can be to retrieve the corresponding subgraph section of the relationship in default knowledge mapping shown in Fig. 4
Point, as seen in Figure 4, the subgraph node result of return are a name entity " grandson XX ", therefore, the quantity of search result
It is 1.
S203 obtains semantic between second rewritten query and the second former inquiry or third original inquiry
With degree;
That is, the semanteme in available second training set of this step between two in any one sample inquiries
Matching degree, when the present embodiment using second positive sample in the second training set come pre-training rewrite to discrimination model when,
What is then obtained here is the second former inquiry in second positive sample and the semantic matching degree between the second rewritten query.When this reality
Example is applied when carrying out pre-training rewriting to discrimination model using second negative sample in the second training set, then what is obtained here is
Third original inquiry in second negative sample and the semantic matching degree between the second rewritten query.
For obtaining the mode of the semantic matching degree between two texts, semantic matching degree can be calculated using any one
Method, which is not described herein again.
In embodiments of the present invention, it can use semantic matching degree model to calculate former query and rewrite between query
Semantic matching degree.
Specifically, can be based on neural network model to semantic matching degree model modeling, such as based on the language of attention
Adopted Matching Model, using cross entropy as the loss function of the semantic matching degree model training, which is divided into
Atten layers (attention layer), compare layers (comparing layer) and polymer layer.
The training data of the semantic matching degree model is the identical same group of query of query result to (positive sample), inquiry
As a result different query is to (negative sample).
After being trained using training data here to the semantic matching degree model, the semantic matching degree is being used
When model prediction, the input of the semantic matching degree model is the second former inquiry and the second rewritten query described in S203, exports and is
Semantic matching degree between second former inquiry and the second rewritten query;Alternatively, the input of the semantic matching degree model is S203 institute
The third original inquiry stated and the second rewritten query export the semantic matching degree between the inquiry of third original and the second rewritten query.
S204 ties the pattern match degree corresponding to any one sample in second training set, the retrieval
The quantity of fruit, the semantic matching degree are input to rewriting and carry out pre-training to discrimination model.
The rewriting so Jing Guo pre-training process shown in Fig. 2 can look into the user of input discrimination model
Ask text and the rewritten query text, judge the rewritten query text whether be the user query text best rewriting
It inquires and exports judging result.
Wherein, three category features are extracted in such a way that any one sample in the second training set is used S201~S203
Data, and using these three types of characteristics and the probability value of mark instruct it to be input to such as GDBT model in advance
Practice, so that the GDBT model after pre-training can differentiate the user query text and rewritten query text of input
Whether the rewritten query text is the best rewritten query of the user query text and exports judging result, wherein if so,
Then judging result is 1, if it is not, then judging result is 0.
The embodiment of the present invention is when training is rewritten to discrimination model, by means of rewriting query and preset map query pattern
The pattern match degree of collection, and go to retrieve default knowledge mapping using query is rewritten, the quantity of search result is obtained, synthesis is examined
The matching degree for rewriteeing query and default knowledge mapping is considered, in addition, also by means of the semantic matches of former query and rewriting query
Degree, to can also consider original query from user perspective and rewrite the matching degree between query, so that the rewriting after pre-training
Whether the rewritten query, which is that the original is inquired most, can be judged accurately to the former inquiry of input and rewritten query to discrimination model
Good rewritten query.Also, the embodiment of the present invention can capture the dynamic that user inputs text when training is rewritten to discrimination model
Feature (above-mentioned three classes characteristic), thus reduce rewrite to discrimination model as time goes by and fail risk.
Optionally, it when executing step 105, can be realized by method shown in fig. 5:
Model G is denoted as to generation model as shown in figure 5, can will rewrite, rewriting is denoted as model D to discrimination model.
During dual training, the training objective for model G is that award maximizes, i.e., the result of following formula 1
OBJECTG(θ) is maximized:
Wherein, it rewrites to model is generated when to a former query generation rewritten query, can be selected from default vocabulary
The vocabulary generation is taken to rewrite query.Such as the lexical item rewritten in query is generated using the vocabulary of entertainment field;
During dual training, the training objective for model D is the result of following formula 2Most
Smallization:
Therefore, during dual training, the objective function of the embodiment of the present invention is as shown in formula 3:
Wherein, the training objective in formula 3 is OBJECTG(θ) takes the result of negative to minimize, andIt takes the result of negative to maximize, objective function shown in formula 3 can be made to restrain in this way.
It can be multiple in above-mentioned second training set to rewriteeing to the training set for generating model during dual training
First positive sample, wherein each first positive sample includes the second former inquiry and the second rewritten query.Because in the first positive sample
In, the second rewritten query is the accurate rewriting text to the second former inquiry;
For rewrite to the positive sample in the third training set of discrimination model be also in above-mentioned second training set multiple the
One positive sample, but be then selected from the rewriting knot rewritten to generation model output for the negative example sample in the negative sample of use
The bad third target rewritten query of fruit is looked into the third target rewriting for generating model output by the second former inquiry and by rewriteeing
Ask the negative sample for constituting and rewriteeing to discrimination model, in dual training.Therefore, third training set includes the multiple first positive samples
Second negative sample of this and the second former inquiry and third target rewritten query composition in first positive sample.
By will rewrite to the bad rewriting for generating model output as a result, to be trained to rewriting to discrimination model,
Rewriting after enabling to training promotes the differentiation accuracy rate of discrimination model, then being trained again to rewriting to model is generated
When, it will be able to the update of parameter is carried out based on the more accurate judging result that rewriting provides discrimination model, to reach
State training objective.
Before dual training, can rewriting with random initializtion Jing Guo pre-training to the parameter θ and rewriting for generating model
To the parameter of discrimination model
The process of dual training is described in detail referring to step as shown in Figure 5:
S301 is based on formula 4, Utilization strategies gradient method, to by pre-training according to the multiple first positive sample
Described rewrite carries out the first iteration update to the parameter θ for generating model G;
Wherein, G (yk|y1:k-1) it is to rewrite the parameter for needing to estimate in dual training to generation model G, y1:k-1Indicate weight
It writes to the described second former inquiry m for generating model G for input, k-1 lexical item of generation, ykIt indicates to rewrite to generation mould
Second former inquiry m of the type G for input, the third rewritten query that k lexical item of generation is constituted, G (yk|y1:k-1) indicate rewriting pair
It generates model G and is generating lexical item y1:k-1When, generate third rewritten query ykProbability;
Qθ(sk-1,yk) in the case where parameter of the rewriting to generation model G is θ, the rewriting is to differentiation mould for expression
The second former inquiry m and third rewritten query y of the type D to inputkThe judgement for carrying out the judgement of best rewritten query, and exporting
As a result, wherein judging result is 0 or 1.Wherein, sk-1Indicate model G in the state at -1 moment of kth.
For example, preceding k-2 moment generated new lexical item is { y1 ... ... yk-2 } to the former inquiry m of model G input second,
Then state (state) s of model G at -1 moment of kthk-1It can be expressed as (m, { y1 ... ..., yk-2 });
Wherein, after the parameter θ to model G carries out random initializtion, the first positive sample can be input to by instructing in advance
Experienced model G is trained, and in training, is trained using Policy-Gradient method, in the training process, can be with reference model
The judging result Q of D outputθ(sk-1,yk), the update of parameter θ is carried out, to achieve the purpose that update iteration to model G, wherein more
New the number of iterations can be empirically determined.
The described second former inquiry in the multiple first positive sample is input to and updates by first iteration by S302
The rewriting afterwards obtains described rewrite and looks into the third rewriting of the described second former query generation generation model to model is generated
It askes;
It, can be by the of any one the first positive sample in multiple first positive samples after updating iteration to model G
Two former inquiries are input to the model G after updating iteration, then model G can export the rewriting to the second former inquiry as a result, this
In be named as third rewritten query.
Described second former inquiry and the third rewritten query are input to the rewriting by pre-training to sentencing by S303
Other model, obtain the third rewritten query whether be the described second former inquiry best rewritten query judging result;
Wherein it is possible to the second former inquiry of the above-mentioned model G being input to after updating iteration, and by model G output
Third rewritten query is input to the model D after pre-training, judges whether third rewritten query is second original by model D
The best rewritten query of inquiry, so that judging result is exported, if it is, judging result Qθ(sk-1,yk) it is 1, it is otherwise 0.
S304 is obtained in the third rewritten query, and corresponding judging result is that the third rewritten query is not described the
The third target rewritten query of the best rewritten query of two former inquiries;
This step, the judging result that can be provided according to model D, to identify by the updated model G of iteration to which
Which third rewritten query of a little second former inquiry outputs is third rewritten query that is inaccurate, i.e., being 0 by judging result, i.e.,
Here third target rewritten query recognizes.
S305 generates third training set, and the third training set includes the multiple first positive sample, multiple second negative samples
This, wherein second negative sample includes the described second former inquiry and the third target rewritten query;
Wherein it is possible to generate the third training set for training pattern D, wherein the positive sample in third training set is still
It is multiple first positive samples in the second training set, but the second negative sample is then selected from defeated by the updated model G of iteration
Accuracy out it is inadequate as a result, the third target rewritten query for being 0 by model G judging result, and generating third mesh
The second former inquiry of model G is input to when marking rewritten query, to constitute second negative sample, so that model G be exported not
Good rewrites as a result, being trained as the negative sample of model D in dual training, the accuracy of judgement of further lift scheme D
Rate.
S306, according to the third training set, to the rewriting Jing Guo pre-training to the parameter of discrimination modelCarry out the
Two iteration update;
Wherein, the step of model D by pre-training here trained using third training set and the second training set of use
The method of the original model D of training is similar, and detailed process is referred to above, and which is not described herein again.Wherein, this step is to mould
When type D is iterated update, the number that iteration updates is also determining based on experience value.
The step of so being updated by above-mentioned the first iteration to model G, and to the step that the secondary iteration of model D updates
After rapid, it can be determined that whether reach above-mentioned training objective, that is, reach training objective shown in formula 3.
If that not reaching above-mentioned training objective, then S307, circulation execute first iteration and update step and institute
It states secondary iteration and updates step, until objective function is restrained;
I.e. circulation executes above-mentioned S301~S306, until objective function is restrained.
Wherein, the objective function are as follows:
Wherein,
Wherein, G (yk|y1:k-1) it is to rewrite the parameter for needing to estimate in dual training to generation model G, y1:k-1Indicate weight
It writes to the described second former inquiry m for generating model G for input, k-1 lexical item of generation, ykIt indicates to rewrite to generation mould
Second former inquiry m of the type G for input, the third rewritten query that k lexical item of generation is constituted, G (yk|y1:k-1) indicate rewriting pair
It generates model G and is generating lexical item y1:k-1When, generate third rewritten query ykProbability;
Qθ(sk-1,yk) in the case where parameter of the rewriting to generation model G is θ, the rewriting is to differentiation mould for expression
The second former inquiry m and third rewritten query y of the type D to inputkThe judgement for carrying out the judgement of best rewritten query, and exporting
As a result, wherein judging result is 0 or 1.Wherein, sk-1Indicate rewrite to generate model G -1 moment of kth state.
Such as to rewriteeing to the former inquiry m of generation model G input second, preceding k-2 moment generated new lexical item is
{ y1 ... yk-2 }, then state (state) s of model G at -1 moment of kthk-1It can be expressed as (m, { y1 ... ..., yk-2 });
Indicate it is described rewriting be to the parameter of discrimination model DIn the case where, it rewrites to differentiation mould
The numerical value of the loss function of type D.
Wherein, p1:kIndicate the preceding k lexical item (substantially of third target rewritten query described in second negative sample
Three target rewritten queries);
Wherein, the numerical value of the loss function is described rewrite to discrimination model D to the described second former inquiry m and third target
Rewritten query p1:kCarry out third target rewritten query p1:kWhen whether being the judgement of best rewritten query of the second former inquiry m, and
The numerical value of obtained loss function.
In embodiments of the present invention, rewriteeing to discrimination model D is GBDT model, therefore can commonly be lost using GBDT
Function, such as Huber loss function, mean square deviation and absolute loss function.Here to rewriteeing to the loss function of discrimination model
Calculation formula is not listed.
Indicate it is described rewriting be to the parameter of discrimination model DIn the case where, it rewrites to discrimination model
The numerical value of the loss function of D.
Wherein, t1:kIndicate the preceding k lexical item (substantially second of second rewritten query in first positive sample
Rewritten query);
Wherein, the numerical value of the loss function is that described rewrite rewrites discrimination model D to the described second former inquiry m and second
Inquire t1:kCarry out the second rewritten query p1:kWhen whether being the judgement of best rewritten query of the second former inquiry m, and obtained damage
Lose the numerical value of function.
In embodiments of the present invention, rewriteeing to discrimination model D is GBDT model, therefore can commonly be lost using GBDT
Function, such as Huber loss function, mean square deviation and absolute loss function.Here to rewriteeing to the loss function of discrimination model
Calculation formula is not listed.
In this way, the embodiment of the present invention by the rewriting after pre-training to generate model and rewrite to discrimination model into
Row dual training can constantly update repeatedly rewriting to discrimination model based on rewriteeing to the output result for generating model
In generation, recycles the rewriting for constantly updating iteration to the output of discrimination model as a result, to change to rewriting to model continuous renewal is generated
Generation, until reach training objective, the rewriting after reaching training objective to generate model can to a former query text of input,
Accurately best rewritten query is generated, then subsequently through using the best rewritten query to carry out data search, so as to mention
Rise the accuracy rate and recall rate of semantic search.The embodiment of the present invention is in two models of training, using the method for intensified learning, phase
For supervised learning method, two models required data when updating be mainly derived from the interaction of environment (i.e. user)/
Sampling, reduces the expense of handmarking's data;In addition, the embodiment of the present invention based on intensified learning method to rewrite to life
When being trained at model, the parameter of adjustment is updated according to the autonomous iteration of award mechanism, be not it is artificially specified, relatively
It is rewritten for rule-based rewrite method more flexible to the parameter regulation for generating model.
After reaching training objective by dual training, so that it may input text to user using rewriteeing to model is generated
Originally it is written over, and using rewriting result come queried access knowledge mapping, to obtain query result.
As shown in fig. 6, this method specifically includes following step the embodiment of the invention also provides a kind of data retrieval method
It is rapid:
Step 601, user query text is received;
Step 602, the user query text input is rewritten to trained in advance to model is generated, is obtained best
Rewritten query text;
Step 603, according to the best rewritten query text, default knowledge icon is retrieved, search result is obtained;
Wherein, described rewrite to model is generated is to pass through dual training in above-mentioned model training method, and make objective function
Convergent to rewrite to model is generated, which is used to carry out any one user query text of input weight to model is generated
It writes, generates best rewritten query text.
For example, the query text of user's input is " whom the wife of Wish i knew Deng performer XX is? ", then by confrontation
Rewriting after training to generate model to input original query " whom the wife of Wish i knew Deng performer XX is? " it is written over,
It generates and exports best rewriting query " spouse of Deng XX ";Then, the method for the embodiment of the present invention can use best rewriting
(knowledge mapping includes many entity types to the default knowledge mapping that query " spouse of Deng XX " goes access as shown in Figure 4, tool
Body is referred to the description above for knowledge mapping.), in default knowledge mapping, can hit star's entity (here for
" Deng XX ") relationship (being here " spouse "), search result is that star's entity (is here " grandson XX ").Finally, can will retrieve
As a result " grandson XX " returns to user.
By means of the method for the embodiment of the present invention, when the former query statement of user's input is not accurate enough, the present invention is implemented
The method of example is by being input to the rewriting after dual training to model is generated, so as to obtain the original for former query statement
The best rewritten query of query statement, and the semanteme of best rewritten query and former query statement is semantic very close to and can be more smart
User's intention is expressed quasi-ly, then being retrieved in knowledge mapping using the best rewritten query sentence after rewriting, then can be mentioned
Rise the hit accuracy rate to the query result of user query sentence.
It should be noted that for simple description, therefore, it is stated as a series of action groups for embodiment of the method
It closes, but those skilled in the art should understand that, embodiment of that present invention are not limited by the describe sequence of actions, because according to
According to the embodiment of the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art also should
Know, the embodiments described in the specification are all preferred embodiments, and the related movement not necessarily present invention is implemented
Necessary to example.
It is corresponding with model training method provided by the embodiments of the present invention, referring to Fig. 7, show one kind of the invention
The structural block diagram of model training apparatus embodiment, can specifically include following module:
First obtains module 701, and for obtaining the first training set, first training set includes matching same queries result
The first former inquiry and the first rewritten query;
First pre-training module 702, for, to rewriteeing to model progress pre-training is generated, being passed through according to first training set
It crosses the rewriting of the pre-training and is used for user query text generation rewritten query text to input to model is generated;
Second obtain module 703, for obtain the second training set, second training set include multiple first positive samples and
Multiple first negative samples, first positive sample include the second former inquiry and the second rewritten query for matching same queries result,
First negative sample includes the third original inquiry for matching different query results and second rewritten query;
Second pre-training module 704, for, to rewriteeing to discrimination model progress pre-training, being passed through according to second training set
The rewriting for crossing the pre-training is used for the user query text of input and the rewritten query text discrimination model
This, judges whether the rewritten query text is the best rewritten query of the user query text and exports judging result;
Dual training module 705, for the method according to dual training, to the rewriting Jing Guo pre-training to generation mould
Type and the rewriting Jing Guo pre-training carry out dual training to discrimination model, and the rewriting after dual training is to life
The best rewritten query text of any one user query text generation to input is used at model.
Optionally, the second pre-training module 704 includes:
First acquisition submodule is inquired for obtaining the second rewritten query described in second training set and preset map
Pattern match degree between set of patterns;
Submodule is retrieved, for default knowledge mapping being retrieved, obtaining search result, obtain according to second rewritten query
Take the quantity of the search result;
Second acquisition submodule is looked into for obtaining second rewritten query with the described second former inquiry or the third original
Semantic matching degree between inquiry;
Pre-training submodule, for by the pattern match corresponding to any one sample in second training set
Degree, the quantity of the search result, the semantic matching degree are input to rewriting and carry out pre-training to discrimination model.
Optionally, first acquisition submodule includes:
First acquisition unit, for obtaining the semanteme of second rewritten query;
Generation unit, for based on context Grammars to the semantic semantic reduction of progress, generative grammar tree;
Matching unit, for the syntax tree to be matched with the scheme-tree that preset map query pattern is concentrated, general
The node ratio being fitted on is identified as the pattern match degree between second rewritten query and preset map query pattern collection.
Optionally, the dual training module 705 includes:
First repetitive exercise submodule, for being based on according to the multiple first positive sampleUtilization strategies gradient method, to by the described heavy of pre-training
It writes and the first iteration update is carried out to the parameter θ for generating model G;
Wherein, G (yk|y1:k-1) it is the parameter rewritten to needs are estimated when generating model G training, y1:k-1It indicates to rewrite to life
At model G for the described second former inquiry m of input, k-1 lexical item of generation, ykIt indicates to rewrite to generation model G needle
To the second former inquiry m of input, the third rewritten query that k lexical item of generation is constituted, G (yk|y1:k-1) indicate to rewrite to generation
Model G is generating lexical item y1:k-1When, generate third rewritten query ykProbability;
Qθ(sk-1,yk) in the case where parameter of the rewriting to generation model G is θ, the rewriting is to differentiation mould for expression
The second former inquiry m and third rewritten query y of the type D to inputkThe judgement for carrying out the judgement of best rewritten query, and exporting
As a result, wherein sk-1Indicate to rewrite to generating state of the model G at -1 moment of kth, the state at -1 moment of kth be expressed as (m,
{ y1 ... ..., yk-2 });
First input submodule, for being input to the described second former inquiry in the multiple first positive sample by institute
The updated rewriting of the first iteration is stated to model is generated, obtains described rewrite to generation model to the second original inquiry life
At third rewritten query;
Second input submodule, for being input to the described second former inquiry and the third rewritten query by pre-training
The rewriting to discrimination model, obtain the third rewritten query whether be the described second former inquiry best rewritten query
Judging result;
Third acquisition submodule, for obtaining in the third rewritten query, corresponding judging result is third rewriting
Inquiry is not the third target rewritten query of the best rewritten query of the described second former inquiry;
Generation module, for generating third training set, the third training set includes the multiple first positive sample, multiple
Second negative sample, wherein second negative sample includes the described second former inquiry and the third target rewritten query;
Secondary iteration trains submodule, for according to the third training set, to the rewriting Jing Guo pre-training to sentencing
The parameter of other modelCarry out secondary iteration update;
Circuit training submodule executes the first iteration update step and secondary iteration update step for recycling
Suddenly, until objective function is restrained;
Wherein, the objective function are as follows:
Wherein,
Wherein, t1:kIndicate the preceding k lexical item of second rewritten query in first positive sample, p1:kDescribed in expression
The preceding k lexical item of third target rewritten query described in second negative sample, m indicate the described second former inquiry;
Indicate it is described rewriting be to the parameter of discrimination model DIn the case where, it is described to rewrite to sentencing
Other model D is to the described second former inquiry m and p1:kWhen carrying out the judgement of best rewritten query, the numerical value of obtained loss function;
Indicate it is described rewriting be to the parameter of discrimination model DIn the case where, the rewriting is to differentiation
Model D is to the described second former inquiry m and t1:kWhen carrying out best rewritten query and judging, the numerical value of obtained loss function.
For device embodiment, since it is substantially similar to model training method embodiment, so the comparison of description
Simply, the relevent part can refer to the partial explaination of embodiments of method.
It is corresponding with data retrieval method provided by the embodiments of the present invention, referring to Fig. 8, show one kind of the invention
The structural block diagram of data searcher embodiment, can specifically include following module:
Receiving module 801, for receiving user query text;
Input module 802, for rewriteeing the user query text input to trained in advance to generation model,
Obtain best rewritten query text;
Retrieval module 803 obtains retrieval knot for retrieving default knowledge icon according to the best rewritten query text
Fruit;
Wherein, described to rewrite to model is generated for being written over to any one user query text of input, it generates
Best rewritten query text.
For device embodiment, since it is substantially similar to data retrieval method embodiment, so the comparison of description
Simply, the relevent part can refer to the partial explaination of embodiments of method.
According to still another embodiment of the invention, the present invention also provides a kind of terminals, comprising: memory, processor and
It is stored in the model training program that can be run on the memory and on the processor, the model training program is described
The step of model training method as described in any one above-mentioned embodiment is realized when processor executes.
Still another embodiment in accordance with the present invention, the present invention also provides a kind of computer readable storage medium, the meter
It is stored with model training program on calculation machine readable storage medium storing program for executing, realizes when the model training program is executed by processor as above-mentioned
Step in model training method described in any one embodiment.
Still another embodiment in accordance with the present invention, the present invention also provides a kind of terminals, comprising: memory, processor and
It is stored in the data retrieving program that can be run on the memory and on the processor, the data retrieving program is described
The step of data retrieval method such as above-described embodiment is realized when processor executes.
Still another embodiment in accordance with the present invention, the present invention also provides a kind of computer readable storage medium, the meter
Data retrieving program is stored on calculation machine readable storage medium storing program for executing, the data retrieving program realizes above-mentioned reality when being executed by processor
Apply the step in the data retrieval method of example.
All the embodiments in this specification are described in a progressive manner, the highlights of each of the examples are with
The difference of other embodiments, the same or similar parts between the embodiments can be referred to each other.
It should be understood by those skilled in the art that, the embodiment of the embodiment of the present invention can provide as method, apparatus or calculate
Machine program product.Therefore, the embodiment of the present invention can be used complete hardware embodiment, complete software embodiment or combine software and
The form of the embodiment of hardware aspect.Moreover, the embodiment of the present invention can be used one or more wherein include computer can
With in the computer-usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) of program code
The form of the computer program product of implementation.
The embodiment of the present invention be referring to according to the method for the embodiment of the present invention, terminal device (system) and computer program
The flowchart and/or the block diagram of product describes.It should be understood that flowchart and/or the block diagram can be realized by computer program instructions
In each flow and/or block and flowchart and/or the block diagram in process and/or box combination.It can provide these
Computer program instructions are set to general purpose computer, special purpose computer, Embedded Processor or other programmable data processing terminals
Standby processor is to generate a machine, so that being held by the processor of computer or other programmable data processing terminal devices
Capable instruction generates for realizing in one or more flows of the flowchart and/or one or more blocks of the block diagram
The device of specified function.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing terminal devices
In computer-readable memory operate in a specific manner, so that instruction stored in the computer readable memory generates packet
The manufacture of command device is included, which realizes in one side of one or more flows of the flowchart and/or block diagram
The function of being specified in frame or multiple boxes.
These computer program instructions can also be loaded into computer or other programmable data processing terminal devices, so that
Series of operation steps are executed on computer or other programmable terminal equipments to generate computer implemented processing, thus
The instruction executed on computer or other programmable terminal equipments is provided for realizing in one or more flows of the flowchart
And/or in one or more blocks of the block diagram specify function the step of.
Although the preferred embodiment of the embodiment of the present invention has been described, once a person skilled in the art knows bases
This creative concept, then additional changes and modifications can be made to these embodiments.So the following claims are intended to be interpreted as
Including preferred embodiment and fall into all change and modification of range of embodiment of the invention.
Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by
One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation
Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning
Covering non-exclusive inclusion, so that process, method, article or terminal device including a series of elements not only wrap
Those elements are included, but also including other elements that are not explicitly listed, or further includes for this process, method, article
Or the element that terminal device is intrinsic.In the absence of more restrictions, being wanted by what sentence "including a ..." limited
Element, it is not excluded that there is also other identical elements in process, method, article or the terminal device for including the element.
Above to a kind of model training method provided by the present invention, a kind of model training apparatus, a kind of data retrieval side
Method, a kind of data searcher, a kind of terminal, a kind of computer readable storage medium, are described in detail, used herein
A specific example illustrates the principle and implementation of the invention, and the above embodiments are only used to help understand
Method and its core concept of the invention;At the same time, for those skilled in the art is having according to the thought of the present invention
There will be changes in body embodiment and application range, in conclusion the content of the present specification should not be construed as to the present invention
Limitation.
Claims (12)
1. a kind of model training method characterized by comprising
The first training set is obtained, first training set includes that the first former inquiry for matching same queries result and the first rewriting are looked into
It askes;
According to first training set to rewriteeing to model progress pre-training is generated, the rewriting by the pre-training is to life
The user query text generation rewritten query text to input is used at model;
Obtaining the second training set, second training set includes multiple first positive samples and multiple first negative samples, and described first
Positive sample includes the second former inquiry of matching same queries result and the second rewritten query, first negative sample include matching not
The inquiry of third original and second rewritten query with query result;
Pre-training carried out to discrimination model to rewriteeing according to second training set, the rewriting by the pre-training is to sentencing
Other model is used for whether judging the rewritten query text to the user query text and the rewritten query text of input
For the user query text best rewritten query and export judging result;
According to the method for dual training, to the rewriting Jing Guo pre-training to generation model and by the described heavy of pre-training
It writes and dual training is carried out to discrimination model, the rewriting after dual training is used for model is generated to any one of input
A best rewritten query text of user query text generation.
2. the method according to claim 1, wherein it is described according to second training set to rewrite to differentiate mould
Type carries out pre-training, comprising:
Obtain the pattern match degree between the second rewritten query described in second training set and preset map query pattern collection;
According to second rewritten query, default knowledge mapping is retrieved, search result is obtained, obtains the number of the search result
Amount;
Obtain the semantic matching degree between second rewritten query and the second former inquiry or third original inquiry;
By the quantity of the pattern match degree, the search result corresponding to any one sample in second training set,
The semantic matching degree is input to rewriting and carries out pre-training to discrimination model.
3. according to the method described in claim 2, it is characterized in that, described obtain the second rewriting described in second training set
Pattern match degree between inquiry and preset map query pattern collection, comprising:
Obtain the semanteme of second rewritten query;
Based on context Grammars are to the semantic semantic reduction of progress, generative grammar tree;
The syntax tree is matched with the scheme-tree that preset map query pattern is concentrated, the node ratio identification that will match to
For the pattern match degree between second rewritten query and preset map query pattern collection.
4. the method according to claim 1, wherein the method according to dual training, to passing through pre-training
The rewriting to generating model and the rewriting Jing Guo pre-training to discrimination model progress dual training, comprising:
According to the multiple first positive sample, it is based onUtilize plan
Slightly gradient method carries out the first iteration update to the parameter θ for generating model G to the rewriting Jing Guo pre-training;
Wherein, G (yk|y1:k-1) it is the parameter rewritten to needs are estimated when generating model G training, y1:k-1It indicates to rewrite to generation mould
Described second former inquiry m of the type G for input, k-1 lexical item of generation, ykIt indicates to rewrite to generation model G for defeated
The former inquiry m of second entered, the third rewritten query that k lexical item of generation is constituted, G (yk|y1:k-1) indicate to rewrite to generation model G
Generating lexical item y1:k-1When, generate third rewritten query ykProbability;
Qθ(sk-1,yk) indicate that the rewriting is to D pairs of discrimination model in the case where parameter of the rewriting to generation model G is θ
The former inquiry m and third rewritten query y of the second of inputkThe judging result for carrying out the judgement of best rewritten query, and exporting,
Wherein, sk-1Indicate to rewrite to generating state of the model G at -1 moment of kth, the state at -1 moment of kth be expressed as (m,
{ y1 ... ..., yk-2 });
The described second former inquiry in the multiple first positive sample is input to updated described by first iteration
It rewrites and obtains the third rewritten query rewritten to generation model to the described second former query generation to model is generated;
Described second former inquiry and the third rewritten query are input to the rewriting by pre-training to discrimination model, obtained
To the third rewritten query whether be the described second former inquiry best rewritten query judging result;
It obtains in the third rewritten query, corresponding judging result is that the third rewritten query is not the described second former inquiry
The third target rewritten query of best rewritten query;
Third training set is generated, the third training set includes the multiple first positive sample, multiple second negative samples, wherein
Second negative sample includes the described second former inquiry and the third target rewritten query;
According to the third training set, to the rewriting Jing Guo pre-training to the parameter of discrimination modelCarry out secondary iteration more
Newly;
Circulation executes first iteration and updates step and secondary iteration update step, until objective function is restrained;
Wherein, the objective function are as follows:
Wherein,
Wherein, t1:kIndicate the preceding k lexical item of second rewritten query in first positive sample, p1:kIndicate described second
The preceding k lexical item of the rewritten query of third target described in negative sample, m indicate the described second former inquiry;
Indicate it is described rewriting be to the parameter of discrimination model DIn the case where, the rewriting is to discrimination model
D is to the described second former inquiry m and p1:kWhen carrying out the judgement of best rewritten query, the numerical value of obtained loss function;
Indicate it is described rewriting be to the parameter of discrimination model DIn the case where, the rewriting is to discrimination model D
To the described second former inquiry m and t1:kWhen carrying out best rewritten query and judging, the numerical value of obtained loss function.
5. a kind of model training apparatus characterized by comprising
First obtains module, and for obtaining the first training set, first training set includes match same queries result first
Original inquiry and the first rewritten query;
First pre-training module, for carrying out pre-training to model is generated to rewriting according to first training set, by described
The rewriting of pre-training is used for the user query text generation rewritten query text to input to generation model;
Second obtains module, and for obtaining the second training set, second training set includes multiple first positive samples and multiple the
One negative sample, first positive sample include the second former inquiry and the second rewritten query for matching same queries result, and described the
One negative sample includes the third original inquiry for matching different query results and second rewritten query;
Second pre-training module, for carrying out pre-training to discrimination model to rewriting according to second training set, by described
The rewriting of pre-training is used for the user query text and the rewritten query text to input, judgement to discrimination model
Whether the rewritten query text is the best rewritten query of the user query text and exports judging result;
Dual training module, for the method according to dual training, to the rewriting Jing Guo pre-training to generate model and
The rewriting by pre-training carries out dual training to discrimination model, and the rewriting after dual training is to generation model
For the best rewritten query text of any one user query text generation to input.
6. device according to claim 5, which is characterized in that the second pre-training module includes:
First acquisition submodule, for obtaining the second rewritten query described in second training set and preset map query pattern
Pattern match degree between collection;
Submodule is retrieved, for default knowledge mapping being retrieved, obtaining search result, obtain institute according to second rewritten query
State the quantity of search result;
Second acquisition submodule inquires it for obtaining second rewritten query and the second former inquiry or the third original
Between semantic matching degree;
Pre-training submodule, for by the pattern match degree, institute corresponding to any one sample in second training set
State the quantity of search result, the semantic matching degree is input to rewriting and carries out pre-training to discrimination model.
7. device according to claim 6, which is characterized in that first acquisition submodule includes:
First acquisition unit, for obtaining the semanteme of second rewritten query;
Generation unit, for based on context Grammars to the semantic semantic reduction of progress, generative grammar tree;
Matching unit will match to for matching the syntax tree with the scheme-tree that preset map query pattern is concentrated
Node ratio be identified as the pattern match degree between second rewritten query and preset map query pattern collection.
8. device according to claim 5, which is characterized in that the dual training module includes:
First repetitive exercise submodule, for being based on according to the multiple first positive sampleUtilization strategies gradient method, to by the described heavy of pre-training
It writes and the first iteration update is carried out to the parameter θ for generating model G;
Wherein, G (yk|y1:k-1) it is the parameter rewritten to needs are estimated when generating model G training, y1:k-1It indicates to rewrite to generation mould
Described second former inquiry m of the type G for input, k-1 lexical item of generation, ykIt indicates to rewrite to generation model G for defeated
The former inquiry m of second entered, the third rewritten query that k lexical item of generation is constituted, G (yk|y1:k-1) indicate to rewrite to generation model G
Generating lexical item y1:k-1When, generate third rewritten query ykProbability;
Qθ(sk-1,yk) indicate that the rewriting is to D pairs of discrimination model in the case where parameter of the rewriting to generation model G is θ
The former inquiry m and third rewritten query y of the second of inputkThe judging result for carrying out the judgement of best rewritten query, and exporting,
Wherein, sk-1Indicate to rewrite to generating state of the model G at -1 moment of kth, the state at -1 moment of kth be expressed as (m,
{ y1 ... ..., yk-2 });
First input submodule, for being input to the described second former inquiry in the multiple first positive sample by described the
The updated rewriting of one iteration obtains described rewrite to generation model to the second original query generation to model is generated
Third rewritten query;
Second input submodule, for the described second former inquiry and the third rewritten query to be input to the institute by pre-training
Rewriting is stated to discrimination model, obtain the third rewritten query whether be the described second former inquiry best rewritten query judgement
As a result;
Third acquisition submodule, for obtaining in the third rewritten query, corresponding judging result is the third rewritten query
It is not the third target rewritten query of the best rewritten query of the described second former inquiry;
Generation module, for generating third training set, the third training set includes the multiple first positive sample, and multiple second
Negative sample, wherein second negative sample includes the described second former inquiry and the third target rewritten query;
Secondary iteration trains submodule, is used for according to the third training set, to the rewriting Jing Guo pre-training to differentiation mould
The parameter of typeCarry out secondary iteration update;
Circuit training submodule executes the first iteration update step and secondary iteration update step for recycling, directly
It is restrained to objective function;
Wherein, the objective function are as follows:
Wherein,
Wherein, t1:kIndicate the preceding k lexical item of second rewritten query in first positive sample, p1:kIndicate described second
The preceding k lexical item of the rewritten query of third target described in negative sample, m indicate the described second former inquiry;
Indicate it is described rewriting be to the parameter of discrimination model DIn the case where, the rewriting is to discrimination model
D is to the described second former inquiry m and p1:kWhen carrying out the judgement of best rewritten query, the numerical value of obtained loss function;
Indicate it is described rewriting be to the parameter of discrimination model DIn the case where, the rewriting is to discrimination model
D is to the described second former inquiry m and t1:kWhen carrying out best rewritten query and judging, the numerical value of obtained loss function.
9. a kind of data retrieval method characterized by comprising
Receive user query text;
The user query text input is rewritten to trained in advance to model is generated, best rewritten query text is obtained
This;
According to the best rewritten query text, default knowledge icon is retrieved, search result is obtained;
Wherein, described to rewrite to model is generated for being written over to any one user query text of input, it generates best
Rewritten query text.
10. a kind of data searcher characterized by comprising
Receiving module, for receiving user query text;
Input module obtains most for rewriteeing the user query text input to trained in advance to model is generated
Good rewritten query text;
Retrieval module, for retrieving default knowledge icon, obtaining search result according to the best rewritten query text;
Wherein, described to rewrite to model is generated for being written over to any one user query text of input, it generates best
Rewritten query text.
11. a kind of terminal characterized by comprising memory, processor and be stored on the memory and can be at the place
The model training program or data retrieving program run on reason device, the model training program are realized when being executed by the processor
The step of model training method according to any one of claims 1 to 4, the data retrieving program are held by the processor
The step in data retrieval method as claimed in claim 9 is realized when row.
12. a kind of computer readable storage medium, which is characterized in that be stored with model instruction on the computer readable storage medium
Practice program or data retrieving program, such as any one of claims 1 to 4 is realized when the model training program is executed by processor
Step in the model training method is realized as claimed in claim 9 when the data retrieving program is executed by processor
Data retrieval method in step.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910005290.1A CN109857845B (en) | 2019-01-03 | 2019-01-03 | Model training and data retrieval method, device, terminal and computer-readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910005290.1A CN109857845B (en) | 2019-01-03 | 2019-01-03 | Model training and data retrieval method, device, terminal and computer-readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109857845A true CN109857845A (en) | 2019-06-07 |
CN109857845B CN109857845B (en) | 2021-06-22 |
Family
ID=66893928
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910005290.1A Active CN109857845B (en) | 2019-01-03 | 2019-01-03 | Model training and data retrieval method, device, terminal and computer-readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109857845B (en) |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110390340A (en) * | 2019-07-18 | 2019-10-29 | 暗物智能科技(广州)有限公司 | The training method and detection method of feature coding model, vision relationship detection model |
CN110457567A (en) * | 2019-07-08 | 2019-11-15 | 阿里巴巴集团控股有限公司 | The error correction method and device of query term |
CN110490080A (en) * | 2019-07-22 | 2019-11-22 | 西安理工大学 | A kind of human body tumble method of discrimination based on image |
CN110765235A (en) * | 2019-09-09 | 2020-02-07 | 深圳市人马互动科技有限公司 | Training data generation method and device, terminal and readable medium |
CN111428119A (en) * | 2020-02-18 | 2020-07-17 | 北京三快在线科技有限公司 | Query rewriting method and device and electronic equipment |
CN112037181A (en) * | 2020-08-12 | 2020-12-04 | 深圳大学 | 2D SAXS atlas analysis model training method and device |
CN112215629A (en) * | 2019-07-09 | 2021-01-12 | 百度在线网络技术(北京)有限公司 | Multi-target advertisement generation system and method based on construction countermeasure sample |
CN112328891A (en) * | 2020-11-24 | 2021-02-05 | 北京百度网讯科技有限公司 | Method for training search model, method for searching target object and device thereof |
CN112348162A (en) * | 2019-08-12 | 2021-02-09 | 北京沃东天骏信息技术有限公司 | Method and apparatus for generating recognition models |
CN112465043A (en) * | 2020-12-02 | 2021-03-09 | 平安科技(深圳)有限公司 | Model training method, device and equipment |
CN112528680A (en) * | 2019-08-29 | 2021-03-19 | 上海卓繁信息技术股份有限公司 | Corpus expansion method and system |
CN112579767A (en) * | 2019-09-29 | 2021-03-30 | 北京搜狗科技发展有限公司 | Search processing method and device for search processing |
CN112860884A (en) * | 2019-11-12 | 2021-05-28 | 马上消费金融股份有限公司 | Method, device, equipment and storage medium for training classification model and information recognition |
CN113569011A (en) * | 2021-07-27 | 2021-10-29 | 马上消费金融股份有限公司 | Training method, device and equipment of text matching model and storage medium |
CN113673245A (en) * | 2021-07-15 | 2021-11-19 | 北京三快在线科技有限公司 | Entity identification method and device, electronic equipment and readable storage medium |
CN114238648A (en) * | 2021-11-17 | 2022-03-25 | 中国人民解放军军事科学院国防科技创新研究院 | Game countermeasure behavior decision method and device based on knowledge graph |
CN115438193A (en) * | 2022-09-23 | 2022-12-06 | 苏州爱语认知智能科技有限公司 | Training method of path inference model and path inference method |
US11816159B2 (en) | 2020-06-01 | 2023-11-14 | Yandex Europe Ag | Method of and system for generating a training set for a machine learning algorithm (MLA) |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130262351A1 (en) * | 2012-03-29 | 2013-10-03 | International Business Machines Corporation | Learning rewrite rules for search database systems using query logs |
CN105335391A (en) * | 2014-07-09 | 2016-02-17 | 阿里巴巴集团控股有限公司 | Processing method and device of search request on the basis of search engine |
CN105808688A (en) * | 2016-03-02 | 2016-07-27 | 百度在线网络技术(北京)有限公司 | Complementation retrieval method and device based on artificial intelligence |
CN106294341A (en) * | 2015-05-12 | 2017-01-04 | 阿里巴巴集团控股有限公司 | A kind of Intelligent Answer System and theme method of discrimination thereof and device |
CN106557480A (en) * | 2015-09-25 | 2017-04-05 | 阿里巴巴集团控股有限公司 | Implementation method and device that inquiry is rewritten |
CN107491447A (en) * | 2016-06-12 | 2017-12-19 | 百度在线网络技术(北京)有限公司 | Establish inquiry rewriting discrimination model, method for distinguishing and corresponding intrument are sentenced in inquiry rewriting |
CN107748757A (en) * | 2017-09-21 | 2018-03-02 | 北京航空航天大学 | A kind of answering method of knowledge based collection of illustrative plates |
CN107861954A (en) * | 2017-11-06 | 2018-03-30 | 北京百度网讯科技有限公司 | Information output method and device based on artificial intelligence |
CN107958067A (en) * | 2017-12-05 | 2018-04-24 | 焦点科技股份有限公司 | It is a kind of based on without mark Automatic Feature Extraction extensive electric business picture retrieval system |
CN108447049A (en) * | 2018-02-27 | 2018-08-24 | 中国海洋大学 | A kind of digitlization physiology organism dividing method fighting network based on production |
CN108460085A (en) * | 2018-01-19 | 2018-08-28 | 北京奇艺世纪科技有限公司 | A kind of video search sequence training set construction method and device based on user journal |
US20180341862A1 (en) * | 2016-07-17 | 2018-11-29 | Gsi Technology Inc. | Integrating a memory layer in a neural network for one-shot learning |
CN109033390A (en) * | 2018-07-27 | 2018-12-18 | 深圳追科技有限公司 | The method and apparatus for automatically generating similar question sentence |
-
2019
- 2019-01-03 CN CN201910005290.1A patent/CN109857845B/en active Active
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130262351A1 (en) * | 2012-03-29 | 2013-10-03 | International Business Machines Corporation | Learning rewrite rules for search database systems using query logs |
CN105335391A (en) * | 2014-07-09 | 2016-02-17 | 阿里巴巴集团控股有限公司 | Processing method and device of search request on the basis of search engine |
CN106294341A (en) * | 2015-05-12 | 2017-01-04 | 阿里巴巴集团控股有限公司 | A kind of Intelligent Answer System and theme method of discrimination thereof and device |
CN106557480A (en) * | 2015-09-25 | 2017-04-05 | 阿里巴巴集团控股有限公司 | Implementation method and device that inquiry is rewritten |
CN105808688A (en) * | 2016-03-02 | 2016-07-27 | 百度在线网络技术(北京)有限公司 | Complementation retrieval method and device based on artificial intelligence |
CN107491447A (en) * | 2016-06-12 | 2017-12-19 | 百度在线网络技术(北京)有限公司 | Establish inquiry rewriting discrimination model, method for distinguishing and corresponding intrument are sentenced in inquiry rewriting |
US20180341862A1 (en) * | 2016-07-17 | 2018-11-29 | Gsi Technology Inc. | Integrating a memory layer in a neural network for one-shot learning |
CN107748757A (en) * | 2017-09-21 | 2018-03-02 | 北京航空航天大学 | A kind of answering method of knowledge based collection of illustrative plates |
CN107861954A (en) * | 2017-11-06 | 2018-03-30 | 北京百度网讯科技有限公司 | Information output method and device based on artificial intelligence |
CN107958067A (en) * | 2017-12-05 | 2018-04-24 | 焦点科技股份有限公司 | It is a kind of based on without mark Automatic Feature Extraction extensive electric business picture retrieval system |
CN108460085A (en) * | 2018-01-19 | 2018-08-28 | 北京奇艺世纪科技有限公司 | A kind of video search sequence training set construction method and device based on user journal |
CN108447049A (en) * | 2018-02-27 | 2018-08-24 | 中国海洋大学 | A kind of digitlization physiology organism dividing method fighting network based on production |
CN109033390A (en) * | 2018-07-27 | 2018-12-18 | 深圳追科技有限公司 | The method and apparatus for automatically generating similar question sentence |
Non-Patent Citations (3)
Title |
---|
XINYUE LIU等: "TreeGAN: Syntax-Aware Sequence Generation with Generative Adversarial Networks", 《2018 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM)》 * |
冯冲等: "融合对抗学习的因果关系抽取", 《自动化学报》 * |
林懿伦等: "人工智能研究的新前线:生成式对抗网络", 《自动化学报》 * |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110457567A (en) * | 2019-07-08 | 2019-11-15 | 阿里巴巴集团控股有限公司 | The error correction method and device of query term |
CN112215629B (en) * | 2019-07-09 | 2023-09-01 | 百度在线网络技术(北京)有限公司 | Multi-target advertisement generating system and method based on construction countermeasure sample |
CN112215629A (en) * | 2019-07-09 | 2021-01-12 | 百度在线网络技术(北京)有限公司 | Multi-target advertisement generation system and method based on construction countermeasure sample |
CN110390340A (en) * | 2019-07-18 | 2019-10-29 | 暗物智能科技(广州)有限公司 | The training method and detection method of feature coding model, vision relationship detection model |
CN110490080A (en) * | 2019-07-22 | 2019-11-22 | 西安理工大学 | A kind of human body tumble method of discrimination based on image |
CN112348162A (en) * | 2019-08-12 | 2021-02-09 | 北京沃东天骏信息技术有限公司 | Method and apparatus for generating recognition models |
CN112348162B (en) * | 2019-08-12 | 2024-03-08 | 北京沃东天骏信息技术有限公司 | Method and device for generating a recognition model |
CN112528680B (en) * | 2019-08-29 | 2024-04-05 | 上海卓繁信息技术股份有限公司 | Corpus expansion method and system |
CN112528680A (en) * | 2019-08-29 | 2021-03-19 | 上海卓繁信息技术股份有限公司 | Corpus expansion method and system |
CN110765235B (en) * | 2019-09-09 | 2023-09-05 | 深圳市人马互动科技有限公司 | Training data generation method, device, terminal and readable medium |
CN110765235A (en) * | 2019-09-09 | 2020-02-07 | 深圳市人马互动科技有限公司 | Training data generation method and device, terminal and readable medium |
CN112579767A (en) * | 2019-09-29 | 2021-03-30 | 北京搜狗科技发展有限公司 | Search processing method and device for search processing |
CN112579767B (en) * | 2019-09-29 | 2024-05-03 | 北京搜狗科技发展有限公司 | Search processing method and device for search processing |
CN112860884A (en) * | 2019-11-12 | 2021-05-28 | 马上消费金融股份有限公司 | Method, device, equipment and storage medium for training classification model and information recognition |
CN111428119A (en) * | 2020-02-18 | 2020-07-17 | 北京三快在线科技有限公司 | Query rewriting method and device and electronic equipment |
US11816159B2 (en) | 2020-06-01 | 2023-11-14 | Yandex Europe Ag | Method of and system for generating a training set for a machine learning algorithm (MLA) |
CN112037181B (en) * | 2020-08-12 | 2023-09-08 | 深圳大学 | 2D SAXS (three dimensional architecture) atlas analysis model training method and device |
CN112037181A (en) * | 2020-08-12 | 2020-12-04 | 深圳大学 | 2D SAXS atlas analysis model training method and device |
CN112328891A (en) * | 2020-11-24 | 2021-02-05 | 北京百度网讯科技有限公司 | Method for training search model, method for searching target object and device thereof |
WO2022116440A1 (en) * | 2020-12-02 | 2022-06-09 | 平安科技(深圳)有限公司 | Model training method, apparatus and device |
CN112465043A (en) * | 2020-12-02 | 2021-03-09 | 平安科技(深圳)有限公司 | Model training method, device and equipment |
CN112465043B (en) * | 2020-12-02 | 2024-05-14 | 平安科技(深圳)有限公司 | Model training method, device and equipment |
CN113673245A (en) * | 2021-07-15 | 2021-11-19 | 北京三快在线科技有限公司 | Entity identification method and device, electronic equipment and readable storage medium |
CN113569011A (en) * | 2021-07-27 | 2021-10-29 | 马上消费金融股份有限公司 | Training method, device and equipment of text matching model and storage medium |
CN114238648A (en) * | 2021-11-17 | 2022-03-25 | 中国人民解放军军事科学院国防科技创新研究院 | Game countermeasure behavior decision method and device based on knowledge graph |
CN115438193A (en) * | 2022-09-23 | 2022-12-06 | 苏州爱语认知智能科技有限公司 | Training method of path inference model and path inference method |
Also Published As
Publication number | Publication date |
---|---|
CN109857845B (en) | 2021-06-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109857845A (en) | Model training and data retrieval method, device, terminal and computer readable storage medium | |
CN109145153A (en) | It is intended to recognition methods and the device of classification | |
CN103823857B (en) | Space information searching method based on natural language processing | |
CN104615589A (en) | Named-entity recognition model training method and named-entity recognition method and device | |
CN107015969A (en) | Can self-renewing semantic understanding System and method for | |
CN107608960A (en) | A kind of method and apparatus for naming entity link | |
CN102768681A (en) | Recommending system and method used for search input | |
CN103324700A (en) | Noumenon concept attribute learning method based on Web information | |
CN112101040B (en) | Ancient poetry semantic retrieval method based on knowledge graph | |
CN103198149A (en) | Method and system for query error correction | |
CN112699216A (en) | End-to-end language model pre-training method, system, device and storage medium | |
CN113064586A (en) | Code completion method based on abstract syntax tree augmented graph model | |
CN109918653A (en) | Determine the association topic of text data and training method, device and the equipment of model | |
CN110084323A (en) | End-to-end semanteme resolution system and training method | |
CN113344098A (en) | Model training method and device | |
CN110781687A (en) | Same intention statement acquisition method and device | |
JP2009146252A (en) | Information processing device, information processing method, and program | |
CN113284499A (en) | Voice instruction recognition method and electronic equipment | |
CN114444462B (en) | Model training method and man-machine interaction method and device | |
CN103914569A (en) | Input prompt method and device and dictionary tree model establishing method and device | |
CN113779190B (en) | Event causal relationship identification method, device, electronic equipment and storage medium | |
Chai et al. | Cross-domain deep code search with few-shot meta learning | |
Xue et al. | A method of chinese tourism named entity recognition based on bblc model | |
CN110795547A (en) | Text recognition method and related product | |
CN116661852A (en) | Code searching method based on program dependency graph |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |