CN112579870A

CN112579870A - Training method, device and equipment for searching matching model and storage medium

Info

Publication number: CN112579870A
Application number: CN202011529224.3A
Authority: CN
Inventors: 张辰; 胡燊; 刘怀军
Original assignee: Beijing Sankuai Online Technology Co Ltd
Current assignee: Beijing Sankuai Online Technology Co Ltd
Priority date: 2020-12-22
Filing date: 2020-12-22
Publication date: 2021-03-30

Abstract

The application discloses a training method, a training device, equipment and a storage medium for retrieving a matching model, and belongs to the field of machine learning. The method comprises the following steps: obtaining a generative model, wherein the generative model is obtained by training according to a first relevance training corpus pair in the existing field; inputting a text to be retrieved in a target field into the generation model to obtain a second relevance training corpus pair of the target field, wherein the second relevance training corpus pair comprises a corresponding relation between the text to be retrieved and a query word; and inputting the second relevance training corpus pair into an initialization model for training to obtain a retrieval matching model matched with the target field.

Description

Training method, device and equipment for searching matching model and storage medium

Technical Field

The embodiment of the application relates to the field of machine learning, in particular to a training method, a training device, training equipment and a storage medium for retrieving a matching model.

Background

The essence of the search is to satisfy the supply-demand matching relationship between the user and the information (e.g., merchant merchandise). Retrieving matching models plays the most fundamental role in the search process.

A large number of search matching models are constructed and learned by using neural networks. But neural networks rely heavily on a large amount of manually labeled corpora for training. For example, the manual markup corpus includes: the relevance level between the document (doc) to be retrieved and the query term (query). In general, the larger the number of samples in the manual annotation corpus, the better the performance of the trained search matching model.

When a new search field appears, due to the lack of the artificial labeled corpus of the new search field, a retrieval matching model of the new search field cannot be obtained through timely training.

Disclosure of Invention

The application provides a training method, a device, equipment and a storage medium for retrieving a matching model. The technical scheme is as follows:

according to an aspect of the present application, there is provided a training method of retrieving a matching model, the method including:

obtaining a generative model, wherein the generative model is obtained by training according to a first relevance training corpus pair in the existing field;

inputting a text to be retrieved in a target field into the generation model to obtain a second relevance training corpus pair of the target field, wherein the second relevance training corpus pair comprises a corresponding relation between the text to be retrieved and a query word;

and inputting the second relevance training corpus pair into an initialization model for training to obtain a retrieval matching model matched with the target field.

According to an aspect of the present application, there is provided a training apparatus for retrieving a matching model, the apparatus including:

the acquisition module is used for acquiring a generative model, and the generative model is obtained by training according to a first relevance training corpus pair in the existing field;

the input module is used for inputting a text to be retrieved in a target field into the generation model to obtain a second relevance training corpus pair in the target field, wherein the second relevance training corpus pair comprises a corresponding relation between the text to be retrieved and a query word;

and the training module is used for inputting the second relevance training corpus pair into an initialization model for training to obtain a retrieval matching model matched with the target field.

According to another aspect of the present application, there is provided a computer device comprising: a processor and a memory, the memory storing a computer program that is loaded and executed by the processor to implement the training method of retrieving a matching model as described above.

According to another aspect of the present application, there is provided a computer readable storage medium storing a computer program which is loaded and executed by a processor to implement the training method of retrieving a matching model as described above.

According to another aspect of the application, a computer program product is provided, which stores a computer program that is loaded and executed by the processor to implement the training method of retrieving matching models as described above.

The beneficial effects brought by the technical scheme provided by the embodiment of the application at least comprise:

the method comprises the steps of obtaining a generating model through training of a first relevance corpus pair in the existing field, calling the generating model to process a text to be retrieved in the new field, and obtaining a second relevance corpus pair in the new field, so that the problem that a retrieval matching model cannot be trained due to the fact that no relevance corpus pair exists in a search scene for the new field is solved. The effect of rapidly deploying the retrieval matching model can be achieved under the condition that the linguistic data and the user behavior data do not need to be manually marked aiming at the search scene of the new field.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a block diagram illustrating a search system provided in an exemplary embodiment of the present application;

FIG. 2 illustrates a flow chart of a training method for retrieving matching models provided by another exemplary embodiment of the present application;

FIG. 3 illustrates a training diagram for retrieving a matching model provided by another exemplary embodiment of the present application;

FIG. 4 illustrates a flow chart of a training method for retrieving matching models provided by an exemplary embodiment of the present application;

FIG. 5 illustrates a training diagram of a generative model provided by another exemplary embodiment of the present application;

FIG. 6 illustrates a flow chart of a training method for retrieving matching models provided by an exemplary embodiment of the present application;

FIG. 7 illustrates a model architecture diagram for retrieving a matching model provided by another exemplary embodiment of the present application;

FIG. 8 illustrates a model architecture diagram for retrieving a matching model provided by another exemplary embodiment of the present application;

FIG. 9 illustrates a block diagram of a training apparatus for retrieving matching models provided by an exemplary embodiment of the present application;

FIG. 10 illustrates a block diagram of a computer device provided by an exemplary embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

FIG. 1 illustrates a block diagram of a retrieval system 100 provided by an exemplary embodiment of the present application. The retrieval system 100 includes: a user terminal 120, a search server 140, and a development terminal 160.

The user terminal 120 is a terminal used by a user. The user terminal 120 may be at least one of a desktop computer, a notebook computer, a tablet computer, an electronic book, an MP3, and an MP 4. The user terminal 120 has an application or web client running thereon. The application or web client is provided with a search service.

The search server 140 is a background server that provides search services. The search server 140 stores therein a retrieval matching model, which is a neural network-based model. The input to retrieve the matching model is the search terms sent by the user terminal 120 and the output to retrieve the matching model is the search text (results). The search matching model may be a plurality of models for different domains.

The development terminal 160 is a terminal used by a developer. The development terminal 160 is used for training to obtain the retrieval matching model.

The search matching model may be trained by a computer device, which may be the search server 140, a development terminal 160 that is different from the search server 140, or another computer device.

Fig. 2 shows a flowchart of a training method for retrieving a matching model according to an exemplary embodiment of the present application. The present embodiment is illustrated with the method applied to the computer apparatus shown in fig. 1. The method comprises the following steps:

step 202: obtaining a generating model;

the generative model is a neural network model with the ability to predict query terms for the input text. The generative model is obtained by training according to the first relevance training corpus pair in the existing field.

In this application, domain refers to the partitioning of different types of search scenarios. Taking a take-out scene as an example, most of users in a historical search scene are query words in catering fields such as food, desserts and drinks, but with the change of user cognition, more and more people can search commodities such as fresh fruits and vegetables, books and mobile phones in non-catering fields. The catering field is the existing field or the original field, and the new flash purchased products such as books, mobile phones and the like are the new fields.

Illustratively, the existing domain is a general domain, or one or more search domains that have been used. The first relevance corpus pair comprises: there are a plurality of first corpus pairs in the existing field, each first corpus pair comprising: search term (query) -text to be retrieved (doc), and relevance rank. The relevance grade is also called relevance score, relevance gear and other names. For example, the relevance rank includes: strong correlation, weak correlation and no correlation.

Step 204: inputting a text to be retrieved in the target field into the generating model to obtain a second relevance training corpus pair in the target field, wherein the second relevance training corpus pair comprises a corresponding relation between the text to be retrieved and the query word;

the target domain is a new domain with respect to the existing domain, or the target domain is a subdivided domain with respect to the existing domain. Taking the target field as a new field as an example, the new field is a field in which the text to be retrieved exists but no query word exists or only a small number of query words exist.

As shown in fig. 3, the text to be retrieved in the new domain is input into the generative model 10 for prediction. And outputting the second relevance training corpus pair of the target field by the generative model. The second relevance corpus pair comprises: a plurality of second corpus pairs of the target domain, each second corpus pair comprising: search term (query) -text to be retrieved (doc), and relevance rank. The relevance grade is also called relevance score, relevance gear and other names.

Step 206: and inputting the second relevance training corpus pair into the initialization model for training to obtain a retrieval matching model matched with the target field.

The initialized model can be a base model which is not trained yet, or can be a model which adopts a pre-trained language model to carry out model parameter initialization.

As shown in fig. 3, each second corpus pair in the second relevance corpus pair is used as a training sample, and is input to the initialization model 20 for training, so as to obtain a retrieval matching model 30 adapted to the target field.

In summary, in the method provided in this embodiment, a generated model is obtained by training a first relevance corpus pair in an existing field, and the generated model is called to process a text to be retrieved in a new field, so as to obtain a second relevance corpus pair in the new field, thereby solving the problem that a retrieval matching model cannot be trained due to the absence of a relevance corpus pair in a search scene for the new field. The effect of rapidly deploying the retrieval matching model can be achieved under the condition that the linguistic data and the user behavior data do not need to be manually marked aiming at the search scene of the new field.

Fig. 4 shows a flowchart of a training method for retrieving a matching model according to another exemplary embodiment of the present application. The present embodiment is illustrated with the method applied to the computer apparatus shown in fig. 1. The method comprises the following steps:

stage one: a training phase of generating a model;

step 402: obtaining a first relevance corpus pair in the existing field, wherein each corpus pair in the first relevance corpus pair comprises: the method comprises the steps of obtaining sample texts, sample query words and sample relevance levels, wherein the relevance levels are used for indicating the degree of relevance between the sample texts and the sample query words;

the first relevant corpus pair is constructed according to the existing field and can be recorded as corpus 1.

The first relevance corpus pair comprises: sample query terms (query), sample text (doc), and sample relevance rank (label). The sample text is different in form according to the use of a specific retrieval system. For example, in a traditional web search scenario, the sample text is dominated by long and medium articles. In the field of e-commerce and take-out, stores and shops are the main. The store may be represented by a Point of Information (POI), and the Product may be represented by a Standard Product Unit (SPU). Illustratively, the text to be retrieved may additionally add other information besides the shops and shops, which is not limited in this embodiment.

The relevance rank includes more than two ranks. In one example, the relevance rank includes relevance and non-relevance. In another example, the relevance rank includes: strong correlation, weak correlation and no correlation. The source of the relevance grade includes but is not limited to at least one of the following:

manual annotation mode;

and manually labeling the query words and the text to be retrieved depending on outsourcing personnel, research and development personnel, product managers and the like.

Unsupervised generation mode;

and automatically generating the order by depending on the historical clicking behavior and the historical ordering behavior of the user. For example, if the user searches for "word a" and places "item B" in a plurality of search results, the word a and the item B are established as a corpus pair having strong correlation.

In the prior art, when the marking result is obtained, the cost is relatively low, and most of the existing data are accumulated.

Step 404: performing word segmentation on the sample text to obtain text word segmentation; performing word segmentation on the sample query word to obtain a query word;

the generative model is a neural network model with the ability to predict query terms for the input text. Illustratively, the generated model is an Encoder-decoder (Encoder-decoder) model structure based on a Bidirectional Encoder Representation (BERT) model, which is more common in the industry at present.

The method comprises the steps that word segmentation is carried out on a sample text in advance to obtain text word segments, and each text word segment is recorded as doc _ tokens after vectorization representation; and performing word segmentation on the sample query words to obtain query word segments, and recording each query word segment as query _ tokens after vectorization representation. Namely, the minimum unit after word segmentation is marked as token.

In the word segmentation process, the word segmentation module of the BERT model can be used for word segmentation (or word segmentation). Assuming that the generated model is a small BERT model (A Lite BERT for Self-supervised Learning of Language representation), the word segmentation module selects the word segmentation module in the ALBERT model.

Optionally, in the process of constructing the generative model by using the encoder and the decoder of the ALBERT model, the model parameters of the encoder and the decoder of the generative model are initialized by using the model parameters of the pre-training language model obtained by the universal corpus training. Wherein the encoder and decoder share model parameters during the training process.

Step 406: inputting the text word segmentation into an encoder to obtain encoded output;

for example, fig. 5 shows a schematic diagram of the structure of the generative model 10. Generative model 10 includes an encoder 12 and a decoder 14. In the training process, the computer device inputs each text word X in the text to be retrieved into the encoder 12 in sequence, and obtains encoded output.

Optionally, at the ith encoding time, the encoder 12 encodes the ith text word in the text to be retrieved to obtain the encoded output corresponding to the ith encoding time. And obtaining the coded output of the text to be retrieved after all text participles in the text to be retrieved are coded.

Step 408: inputting the coding output and the correlation grade into a decoder to obtain a prediction query word;

in order to reflect the influence of the relevance grade on the text to be retrieved, the computer equipment inputs the coded output and the relevance grade into a decoder to obtain the predicted query participle.

Illustratively, an attention matrix is also designed between the encoder and the decoder. And the computer equipment carries out embedding processing on the sample correlation grade to obtain a correlation grade vector. Attention weighting is carried out on the correlation level vector and the coded output through an attention moment array to obtain a weighted vector; and inputting the weighted vector into a decoder, and decoding to obtain the predicted query participle.

In one example, the weighting vector is input to the decoder only at the first decoding instant. And at each subsequent decoding moment, inputting the predicted query participle output by the decoder at the historical decoding moment into the decoder to obtain the predicted query participle of the decoder at the next decoding moment.

In another example, after the weighting vector is input to the decoder at the first decoding time, the weighting vector and the predicted query participles output by the decoder at the historical decoding time are input to the decoder at each subsequent decoding time, and the predicted query participles of the decoder at the next decoding time are obtained.

As shown in fig. 5, at the 1 st decoding time, the weighting vector is input into the decoder 14, and the predicted query participle Y1 corresponding to the 1 st decoding time is obtained;

inputting the weighted vector and a predicted query participle Y1 output by the decoder at the historical decoding moment into the decoder at the 2 nd decoding moment to obtain the predicted query participle of the decoder at the 2 nd decoding moment;

and at the 3 rd decoding moment, inputting the weighted vector and the predicted query participles Y1-Y2 output by the decoder at the historical decoding moment into the decoder to obtain the predicted query participle Y3 of the decoder at the 3 rd decoding moment, and repeating the steps in the same way.

Step 410: the model parameters of the encoder and decoder are updated based on the predicted query participles and the error between the query participles.

Alternatively, the error between the predicted query participle and the query participle is expressed using standard cross entropy.

And a second stage: a training corpus generation stage of the new field;

step 412: obtaining a generating model;

the computer device obtains the trained generative model.

Step 414: inputting a text to be retrieved in the target field into the generating model to obtain a second relevance training corpus pair in the target field, wherein the second relevance training corpus pair comprises a corresponding relation between the text to be retrieved and the query word;

The computer equipment inputs the text to be retrieved in the target field into the generating model, and the generating model predicts the query word and the relevance grade, so that a second relevance training corpus pair of the target field is obtained, wherein the second relevance training corpus pair comprises the corresponding relation between the text to be retrieved and the query word.

In one example, since the target domain has less text to be retrieved, the number of texts to be retrieved also needs to be enhanced. As shown in fig. 6, the following steps are optionally included:

step 61: training by adopting a text to be retrieved in the target field to obtain a second pre-training language model;

the goal of the language model is to describe the probability of a word/phrase in a sentence. The language model is a model trained from a plurality of corpus information, and is used for "learning" the probability of a certain word in the corpus.

Domain knowledge in the new domain may be trained from the second pre-trained language model. Schematically, a language model represented by a BERT model is still selected for pre-training to obtain a pre-training language model. The "pre-trained language model" and the "generative model" are two different models, but the model architecture used may be the same or different.

In the process of training the second pre-training language model, firstly, a word segmentation module is adopted to segment words of a text to be retrieved to obtain text word segments, and after vectorization representation is carried out on each text word segment, the text word segment is marked as doc _ tokens. It should be noted that the word segmentation modules used by the second pre-training language model and the generation model are the same or consistent.

Schematically, an open source model checkpoint model obtained by training a general language library is used as a basic model, text segmentation of a text to be retrieved in a new field is used for pre-training, and field knowledge in the new field is learned. And finally obtaining a second pre-training language model in the new field.

The pre-training Language Model may be trained based on a Masked Language Model Task (MLM). The MLM means that a certain word in a text to be retrieved can be randomly shielded in the training process, a currently shielded word (similar to a complete filling) is predicted by the second pre-training language model according to the upper and lower information of the word, and the sequence and the structure of the original text are not changed after prediction.

Step 62: inputting the text to be retrieved in the target field into a second pre-training language model to obtain an enhanced text to be retrieved;

and inputting the text to be retrieved in the target field into the pre-training language model to obtain the enhanced text to be retrieved because the number of the texts to be retrieved in the new field is less.

In the schematic enhancing process, randomly shielding word positions in a text to be retrieved in a target field by computer equipment, and predicting the word positions through a pre-training language model to obtain predicted words; and substituting the predicted words into the word positions to obtain the enhanced text to be retrieved.

Optionally, at least one word in the text to be retrieved of the target domain is masked at a time. That is, one word or multiple words in the text to be retrieved of the target domain are masked at a time. Wherein n is a preset value.

For example, the text to be retrieved is 'beef stewed with tomatoes', and when tomatoes are masked, the text to be retrieved predicted by the pre-training model is 'beef stewed with tomatoes'; when the beef is masked, the text to be retrieved predicted by the pre-training model is the sirloin stewed with tomatoes. When the tomatoes and the stews are masked, the text to be retrieved predicted by the pre-training model is the potato beef.

And determining the original text to be retrieved in the new field and the collection of the text to be retrieved predicted by the pre-training language model as the enhanced text to be retrieved. The text content of the enhanced text to be retrieved is more than that of the text to be retrieved.

And step 63: and inputting the enhanced text to be retrieved into the generation model to obtain a second relevance training corpus pair of the target field.

The second relevance training corpus pair comprises a corresponding relation between a text to be retrieved and a query word;

and inputting the text to be retrieved in the new field into the generation model for prediction. And outputting the second relevance training corpus pair of the target field by the generative model. The second relevance corpus pair comprises: a plurality of second corpus pairs of the target domain, each second corpus pair comprising: search term (query) -text to be retrieved (doc), and relevance rank. The relevance grade is also called relevance score, relevance gear and other names.

The number of the correlation levels is n, and illustratively, the number of the correlation levels is three, and the correlation levels include: strong correlation, weak correlation and no correlation. Label 0 is used to represent strong correlation, label 1 is used to represent weak correlation, and label 3 is used to represent no correlation. And inputting the text to be retrieved and the three labels of each new field into a generating model, and generating query words (query) belonging to different relevance levels by the generating model. And finally, generating three query-doc pairs for each text to be retrieved, wherein the three query-doc pairs correspond to three labels respectively.

And a third stage: a training phase of retrieving the matching model;

step 416: inputting the second relevant training corpus pair into the initialization model for training to obtain a retrieval matching model matched with the target field;

And (4) taking each second corpus pair in the second correlation training corpus pairs as a training sample, inputting the training sample into the initialization model for training, and obtaining a retrieval matching model adaptive to the target field. The retrieval matching model is a neural network model with the capability of outputting corresponding texts to be retrieved for input query words. The input of the retrieval matching model is a query word, and the output is a text to be retrieved.

Illustratively, the initialization model includes a second coder, and model parameters of the second coder in the initialization model can be initialized by using model parameters of the coder in the second pre-training language model. And the second pre-training language model is obtained by training based on the text to be retrieved in the new field.

In one example, the initialization model employs a classic twin-tower Semantic matching model (DSSM) as a representative. The DSSM model has the characteristics that the query and doc codes are coded in the same vector space, corresponding vector results can be reserved offline, and improvement of online performance is obviously facilitated.

As shown in fig. 7, the DSSM model includes: an input layer 71, a presentation layer 72 and a matching layer 73.

In the input layer 71, CLS represents the beginning and end of the first sentence, Tok1 represents the 1 st word of the sentence, and Tokn represents the nth word of the sentence. SEP is used to separate 2 sentences. That is, the computer device performs the word segmentation processing on the query and the doc by using the word segmentation module, and then inputs the word segmentation processing into the presentation layer 72 for encoding.

In the presentation layer 72, two sets of concatenated BERT encoders (second encoders) and an average pooling layer are included, both of which are initialized with model parameters of the second pre-trained language model. Wherein, the input corresponding to a group of BERT encoders and the average pooling layer is a query word, and the output is a first characteristic representation of the query word; the other set of BERT encoders and the average pooling layer correspond to the input of the shop name and the trade name in the text to be inquired, and the output of the BERT encoders and the average pooling layer is a second characteristic representation of the shop name and the shop name. Take the example that the query term includes N query participles, E_[CLS]Is an input representation of the input layer 71 to the CLS, E₁Is an input representation, E, of the input layer 71 to the first query participle Tok1_NIs an input representation of the nth query participle TokN by the input layer 71. C is a BERT encoder pair E_[CLS]Of a semantic representation vector, T₁Is a BERT encoder pair E₁Is input to indicate, T_NIs a BERT encoder pair E_NThe semantic representation vector of (1). And so on, and will not be described in detail.

In the matching layer 73, a cosine similarity between the first and second feature representations is calculated. Cosine similarity is used to determine whether two vectors point in the same direction. When the two vectors have the same direction, the cosine similarity value is 1; when the included angle between the two vectors is 90 degrees, the cosine similarity value is 0. Wherein the matching layer 73 is also called softmax layer.

In the actual model training, the model structure of a Deep Semantic matching model (DSSM) can be simplified according to hardware conditions and time requirements, for example, model parameters (or network weights) of the first n layers in a BERT encoder can be fixed, only the last layers are trained, and the like.

In another example, the initialization model employs a modified version of the interactive deep semantic matching model. As shown in fig. 8, the improved version of the interactive deep semantic matching model is divided into a left-side network structure and a right-side network structure. The left side network structure mainly includes: an input layer 81, an interaction layer 82 and a fully connected layer 83.

The input layer 81 comprises two second encoders, which are each initialized with model parameters of a second pre-trained language model. One of the second encoders is used for encoding the text to be retrieved and outputting the semantic representation vector doc-vec of the text to be retrieved, and the other second encoder is used for encoding the query word and outputting the semantic representation vector query-vec of the query word. The two second encoders share the weight.

The interaction layer 82 includes an average pooling layer, a maximum pooling layer and a normalization (Norm) layer, and calculates similarity vectors from semantic representation vectors of query and doc respectively through vector results obtained by the maximum pooling layer and the average pooling layer, and the similarity vectors can be calculated in various ways, such as cosine (cosine), Jaccard (jaccard), dot-product (dot-product), and the like. The normalization layer is used for performing normalization processing on the similarity vectors.

The full-link layer 83 is used to splice the two similarity vectors output by the interaction layer 82, and then input the two similarity vectors into the full-link layer of the upper layer.

The network architecture on the right side is a Multi-Layer perceptron Model (MLP), and the detailed structure is not described again. Some additional features are used in the network architecture on the right, relying on the work in feature engineering. Illustratively, the additional features may include literal text features of query and doc, such as word number, co-occurring word/word number, co-occurring location, etc.; alternatively, the additional features may include text similarity, such as BM25, Term Frequency-Inverse file Frequency (TF-IDF), edit distance, and the like; alternatively, the additional features may include vector similarity, such as the result of word vectors like BERT, word2vec, fast-text, etc.; alternatively, the additional features include category similarity, such as text classification labels, merchant categories, merchandise categories, and the like, as well as features of other dimensions, and the like.

The ductility of the initialization model can be improved by using artificial characteristic engineering, a foundation is laid for subsequent iteration, and meanwhile the prediction accuracy of the initialization model can be effectively improved. Finally, vectors output by the left and right network results are spliced together, and the final result is output after passing through a full connection layer 83, a dense layer and an output (softmax) layer.

And a fourth stage: retrieving the use stage of the matching model;

step 418: and providing retrieval service of the target field by using the trained retrieval matching model.

Illustratively, the developer deploys the trained retrieval matching model to the search server, and the search server provides the retrieval service of the target field by using the trained retrieval matching model. For example, the new field is a book field, and the search server provides a search service in the book field by using a search matching model; for another example, the new domain is a mobile phone domain, and the search server provides a search service in the mobile phone domain using a search matching model.

In summary, the method provided in this embodiment does not need to manually label the sample training data of the new field, and does not need to accumulate user behaviors in the new field in advance, and can automatically generate the relevant corpus to perform the learning of the search matching model, and meanwhile, the pre-training language model is utilized to well adapt to the new field.

In addition, the method provided by the embodiment has strong adaptability and excellent expandability. The method is more comfortable for more new fields during high-speed business expansion, meanwhile, a large amount of candidate marking data can be provided before the retrieval matching model is on line, and after manual marking data or a large amount of user behaviors are accumulated in the later period, the retrieval matching model trained by the method can still be used for continuous learning, so that the iteration efficiency of the retrieval matching model is greatly improved.

In addition, the method provided by the embodiment has high accuracy, on one hand, the pre-training language model contains domain knowledge, on the other hand, because the text used by the retrieval matching model in the training process is the text to be retrieved in the new domain, the prediction accuracy of the retrieval matching model is more accurate than the correlation obtained by using the traditional methods such as the original domain knowledge, simple literal matching or semantic similarity, the whole user experience in the new domain searching process is improved, and the brand confidence of the user can be further improved.

FIG. 9 is a block diagram illustrating a training apparatus for retrieving matching models according to an exemplary embodiment of the present application. The device includes:

an obtaining module 920, configured to obtain a generative model, where the generative model is obtained by training according to a first relevance corpus pair in an existing field;

an input module 940, configured to input a text to be retrieved in a target field into the generation model, to obtain a second relevant corpus pair in the target field, where the second relevant corpus pair includes a correspondence between the text to be retrieved and a query term;

the training module 960 is configured to input the second relevance corpus pair to an initialization model for training, so as to obtain a retrieval matching model adapted to the target field.

In an alternative design of the present application, the generative model includes a first encoder and a first decoder; the device further comprises: a word segmentation module 980;

the obtaining module 920 is further configured to obtain a first relevance corpus pair in the existing field, where each corpus pair in the first relevance corpus pair includes: sample text, sample query terms, and a sample relevance rank indicating a degree of relevance between the sample text and the sample query terms;

the word segmentation module 980 is further configured to perform word segmentation on the sample text to obtain text words; performing word segmentation on the sample query word to obtain a query word segmentation;

the input module 940 is further configured to input the text word segmentation into the encoder to obtain encoded output; inputting the coded output and the relevance grade into the decoder to obtain a prediction query participle;

the training module 960 is configured to update the model parameters of the encoder and the decoder according to the error between the predicted query participle and the query participle.

In an optional design of the present application, the input module 940 is further configured to perform embedding processing on the sample correlation level to obtain a correlation level vector; carrying out attention weighting on the correlation level vector and the coded output through an attention matrix to obtain a weighted vector; and inputting the weighted vector into the decoder, and decoding to obtain the predicted query participle.

In an alternative design of the present application, the training module 960 is further configured to initialize the model parameters of the encoder and the decoder using the model parameters of the first pre-trained language model; the first pre-training language model is obtained by adopting general corpus training;

wherein the encoder and the decoder share the model parameters in a training process.

In an optional design of the present application, the input module 940 is further configured to input the text to be retrieved in the target field to a second pre-training language model, so as to obtain an enhanced text to be retrieved; inputting the enhanced text to be retrieved into the generation model to obtain a second relevant training corpus pair of the target field;

the text content of the enhanced text to be retrieved is more than that of the text to be retrieved, and the second pre-training language model is obtained by training based on the text to be retrieved in the target field.

In an optional design of the present application, the input module 940 is further configured to randomly mask word positions in the text to be retrieved in the target field, and predict the word positions through the first pre-training language model to obtain predicted words; and substituting the predicted word into the word position to obtain the enhanced text to be retrieved.

In an optional design of the present application, the initialization model includes a second encoder, and the training module 960 is further configured to initialize the model parameters of the second encoder in the initialization model using the model parameters of the encoder in the second pre-training language model;

and the second pre-training language model is obtained by training based on the text to be retrieved in the target field.

Fig. 10 shows a structural framework diagram of a computer device 1000 provided by an embodiment of the present application. Specifically, the method comprises the following steps: the computer apparatus 1000 includes a Central Processing Unit (CPU)1001, a system memory 1004 including a Random Access Memory (RAM)1002 and a Read Only Memory (ROM)1003, and a system bus 1005 connecting the system memory 1004 and the central processing unit 1001. The computer device 1000 also includes a basic input/output system (I/O system) 1006, which facilitates the transfer of information between devices within the computer, and a mass storage device 1007, which stores an operating system 1013, application programs 1014, and other program modules 1015.

The basic input/output system 1006 includes a display 1008 for displaying information and an input device 1009, such as a mouse, keyboard, etc., for user input of information. Wherein the display 1008 and the input device 1009 are both connected to the central processing unit 1001 through an input-output controller 1100 connected to the system bus 1005. The basic input/output system 1006 may also include an input/output controller 1010 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, the input-output controller 1010 also provides output to a display screen, a printer, or other type of output device.

The mass storage device 1007 is connected to the central processing unit 1001 through a mass storage controller (not shown) connected to the system bus 1005. The mass storage device 1007 and its associated computer-readable media provide non-volatile storage for the computer device 1000. That is, the mass storage device 1007 may include a computer readable medium (not shown) such as a hard disk or CD-ROM drive.

Without loss of generality, the computer-readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, EPROM, EEPROM, flash memory or other solid state memory technology, CD-ROM, DVD, or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will appreciate that the computer storage media is not limited to the foregoing. The system memory 1004 and mass storage device 1007 described above may be collectively referred to as memory.

According to various embodiments of the present application, the computer device 1000 may also operate as a remote computer connected to a network via a network, such as the Internet. That is, the computer device 1000 may be connected to the network 1012 through the network interface unit 1011 connected to the system bus 1005, or may be connected to other types of networks or remote computer systems (not shown) using the network interface unit 1011.

The memory also includes one or more programs, which are stored in the memory, and the one or more programs include a training method for performing the search matching model provided in the embodiments of the present application.

The present application further provides a computer-readable storage medium, in which a computer program is stored, and the computer program is loaded and executed by a processor to implement the training method for retrieving the matching model provided by the above method embodiments.

Optionally, the present application also provides a computer program product containing instructions which, when run on a computer, cause the computer to perform the training method of retrieving a matching model according to the above aspects.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A training method for retrieving a matching model, the method comprising:

2. The method of claim 1, wherein the generative model comprises a first encoder and a first decoder, the method further comprising:

obtaining a first relevance corpus pair of the existing field, wherein each corpus pair of the first relevance corpus pair comprises: sample text, sample query terms, and a sample relevance rank indicating a degree of relevance between the sample text and the sample query terms;

performing word segmentation on the sample text to obtain text word segmentation; performing word segmentation on the sample query word to obtain a query word segmentation;

inputting the text word segmentation into the encoder to obtain encoded output;

inputting the coded output and the relevance grade into the decoder to obtain a prediction query participle;

updating model parameters of the encoder and the decoder according to the error between the predicted query participle and the query participle.

3. The method of claim 2, wherein the generative model further comprises an attention matrix, and wherein inputting the encoded output and the relevance rank into the decoder results in a predicted query participle comprising:

embedding the sample correlation grade to obtain a correlation grade vector;

carrying out attention weighting on the correlation level vector and the coded output through an attention matrix to obtain a weighted vector;

and inputting the weighted vector to the decoder, and decoding to obtain the predicted query participle.

4. The method of claim 2, further comprising:

initializing model parameters of the encoder and the decoder using model parameters of a first pre-trained language model; the first pre-training language model is obtained by adopting general corpus training;

5. The method according to any one of claims 1 to 4, wherein the step of inputting the text to be retrieved in the target field into the generative model to obtain the second relevance corpus pair in the target field comprises:

inputting the text to be retrieved in the target field into a second pre-training language model to obtain an enhanced text to be retrieved;

inputting the enhanced text to be retrieved into the generation model to obtain a second relevant training corpus pair of the target field;

6. The method according to claim 5, wherein the inputting the text to be retrieved in the target domain into the first pre-trained language model to obtain the enhanced text to be retrieved comprises:

randomly masking word positions in the text to be retrieved in the target field, and predicting the word positions through the first pre-training language model to obtain predicted words;

and substituting the predicted word into the word position to obtain the enhanced text to be retrieved.

7. The method of any of claims 1 to 4, wherein a second encoder is included in the initialization model, the method further comprising:

initializing model parameters of a second encoder in the initialization model using model parameters of the encoder in a second pre-trained language model;

8. A training apparatus for retrieving a matching model, the apparatus comprising:

9. A computer device, characterized in that the computer device comprises: a processor and a memory, the memory storing a computer program that is loaded and executed by the processor to implement the training method of retrieving a matching model according to any of claims 1 to 7.

10. A computer-readable storage medium, characterized in that it stores a computer program which is loaded and executed by a processor to implement the training method for retrieving matching models according to any one of claims 1 to 7.