CN118152532A

CN118152532A - Reply generation method, long tail recognition model training method and corresponding device

Info

Publication number: CN118152532A
Application number: CN202410269348.4A
Authority: CN
Inventors: 张涛林; 李东阳; 严俊冰; 汪诚愚; 黄龙涛; 薛晖; 黄�俊
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2024-03-08
Filing date: 2024-03-08
Publication date: 2024-06-07

Abstract

The embodiment of the application discloses a reply generation method, a training method of a long tail recognition model and a corresponding device. Relates to the technical field of artificial intelligence. The method comprises the following steps: acquiring an input text; long tail recognition is carried out on the input text; and generating a prompt text according to the long tail recognition result, inputting the prompt text into a second answer generation model, and obtaining an answer to the input text, which is generated by the second answer generation model based on the input prompt text. The application considers the long tail type characteristic of the input text before generating the answer, and generates the prompt text based on the long tail recognition result, thereby improving the accuracy of generating the answer.

Description

Reply generation method, long tail recognition model training method and corresponding device

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a reply generation method, a long tail model training method and a corresponding device.

Background

With the continuous development of artificial intelligence technology, generative models such as large language models are increasingly widely used. The generative model can automatically generate a reply to the input text of the user according to the input text of the user and feed the reply back to the user. In current implementations, however, the same answer generation method is employed for all input text. However, generative models, such as large language models, differ in learning ability for different knowledge. Therefore, the accuracy of generating the response by the current response generation method remains to be improved.

Disclosure of Invention

In view of the above, the present application provides a reply generation method, a training method of long tail model and a corresponding device, so as to improve the accuracy of reply generation.

The application provides the following scheme:

In a first aspect, a reply generation method is provided, the method including:

acquiring an input text;

Long tail recognition is carried out on the input text;

And generating a prompt text according to the long tail recognition result, and inputting the prompt text into a second answer generation model to obtain an answer to the input text, wherein the answer is generated by the second answer generation model based on the input prompt text.

According to an implementation manner of the embodiment of the present application, performing long tail recognition on the input text includes:

Inputting the input text into a long tail recognition model obtained by pre-training, and obtaining a long tail probability value predicted by the long tail recognition model;

and determining whether the input text is of a long tail type according to the long tail probability value.

According to an implementation manner of the embodiment of the present application, the method further includes:

the long tail recognition model comprises a first answer generation model;

The long tail probability value is obtained according to the probability value of the ending symbol Token of the Token sequence predicted by the first reply generation model, and the Token sequence is a sequence corresponding to a reply generated by the first reply generation model for the input text.

According to an implementation manner of the embodiment of the present application, the generating the prompt text according to the long tail recognition result includes:

If the input text is of a long tail type, searching a knowledge base by using the input text to obtain K candidate knowledge, wherein K is a preset positive integer; obtaining the prompt text according to the input text and the K candidate knowledge;

Otherwise, the prompt text is obtained according to the input text.

According to one implementation of the embodiment of the present application, the second reply generation model is obtained based on a large language model LLM.

In a second aspect, a training method of a long tail recognition model is provided, the method comprising:

Acquiring training data comprising a plurality of training samples, wherein the training samples comprise input samples and corresponding reply samples thereof;

Training a long tail recognition model by using the training data, wherein long tail recognition is performed on the input sample by using the long tail recognition model; generating a prompt text according to the long tail recognition result, inputting the prompt text into a second answer generation model, and acquiring an answer for the input sample, which is generated by the second answer generation model based on the prompt text; the training targets include: the difference between the reply to the input sample and the corresponding reply sample is minimized.

According to an implementation manner of the embodiment of the present application, the long tail recognition model includes a first reply generation model;

The long tail recognition of the input sample by using the long tail recognition model comprises the following steps: inputting the input sample into the first answer generation model; determining a long tail probability value by using the first reply generation model according to the degree of difference between replies generated by the input samples and reply samples corresponding to the input samples, probability values corresponding to Token in the replies generated by the first reply generation model, average word frequency of the input samples and gradient values corresponding to the input samples; and determining whether the input sample is of a long tail type according to the long tail probability value.

If the input sample is of a long tail type, searching a knowledge base by using the input sample to obtain K candidate knowledge, wherein K is a preset positive integer; obtaining the prompt text according to the input sample and the K candidate knowledge;

Otherwise, obtaining the prompt text according to the input sample.

And in the training process, freezing parameters of a second answer generation model, and updating parameters of the long tail recognition model by using a loss function corresponding to the training target.

In a third aspect, a reply generation method is provided, which is executed by a cloud server, and the method includes:

Acquiring a request sent by user equipment, and analyzing the request to obtain an input text;

Long tail recognition is carried out on the input text;

Generating a prompt text according to the long tail recognition result, and inputting the prompt text into a second answer generation model to obtain an answer to the input text, wherein the answer is generated by the second answer generation model based on the input prompt text;

generating a response to the request with the reply, and returning the response to the user equipment.

In a fourth aspect, there is provided a reply generation apparatus, the apparatus comprising:

A text acquisition unit configured to acquire an input text;

a long tail recognition unit configured to perform long tail recognition on the input text;

And a reply generation unit configured to generate a prompt text according to the result of the long tail recognition, input the prompt text into a second reply generation model, and obtain a reply to the input text generated by the second reply generation model based on the input prompt text.

In a fifth aspect, there is provided a training apparatus for a long tail recognition model, the apparatus comprising:

A sample acquisition unit configured to acquire training data including a plurality of training samples including input samples and their corresponding reply samples;

A model training unit configured to train a long tail recognition model using the training data, wherein long tail recognition is performed on the input sample using the long tail recognition model; generating a prompt text according to a long tail recognition result, inputting the prompt text into a second answer generation model, and acquiring an answer for the input sample, which is generated by the second answer generation model based on the prompt text; the training targets include: the difference between the reply to the input sample and the corresponding reply sample is minimized.

According to a sixth aspect, there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method of any of the first, second and third aspects described above.

According to a seventh aspect, there is provided an electronic device comprising:

One or more processors; and

A memory associated with the one or more processors, the memory for storing program instructions that, when read for execution by the one or more processors, perform the steps of the method of any one of the first, second and third aspects above.

According to the specific embodiment provided by the application, the application discloses the following technical effects:

1) In the reply generation method, long tail recognition is performed on the input text, the prompt text used for inputting the second reply generation model is generated according to the recognition result, and the second reply generation model generates a reply according to the prompt text.

2) According to the method, different processing modes are adopted according to the long-tail recognition result, namely, knowledge retrieval is carried out on the long-tail type input text to obtain a plurality of candidate knowledge, knowledge information related to the input text is added in the content of the prompt text, so that the second reply generation model has more external knowledge for reference when generating replies, the defect of the second reply generation model on long-tail sparse knowledge learning capability is overcome, and the reply generation accuracy is improved. For the input text of a non-long-tail type, the general knowledge is well learned by the common answer generation model, so that the higher accuracy can be achieved by directly generating an answer to the input text by using the second answer generation model without introducing external knowledge.

3) Because the reply generation model is easier to learn the knowledge of the non-long-tail type, after the long-tail recognition is carried out on the input text, the application does not carry out additional knowledge retrieval on the input text of the non-long-tail type, and the reply is generated by directly using the existing generation capacity of the second reply generation model, so that the calculation cost is reduced while the reply accuracy is ensured, and the reasoning (namely reply generation) speed is improved.

4) In the process of training a long tail recognition model, a new mechanism is introduced to determine a long tail probability value of an input sample, namely, the long tail probability value is determined by utilizing the degree of difference between a reply generated by the first reply generation model for the input sample and a reply sample corresponding to the input sample, the probability value corresponding to each Token in the reply generated by the first reply generation model, the average word frequency of the input sample and the gradient value corresponding to the input sample, so that the measurement and the study of the long tail type input sample are more effectively carried out.

5) According to the application, in the training process of the long-tail recognition model, the parameters of the second reply generation model are frozen, the parameters of the long-tail recognition model are updated by using the loss function corresponding to the training target, and on the basis of not affecting the learned knowledge of the second reply generation model, the long-tail recognition model can quickly learn the long-tail characteristics of the input text, so that the pertinence and the efficiency of model training are improved, and the accuracy of the long-tail recognition model is improved.

Of course, it is not necessary for any one product to practice the application to achieve all of the advantages set forth above at the same time.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a diagram of a system architecture to which embodiments of the present application are applicable;

FIG. 2 is a flowchart of a reply generation method according to an embodiment of the present application;

FIG. 3a is a schematic illustration of the generation of a reply provided by an embodiment of the application;

FIG. 3b is a schematic diagram of a training long tail recognition model according to an embodiment of the present application;

FIG. 4 is a schematic block diagram of a reply generation device provided by an embodiment of the present application;

FIG. 5 is a schematic block diagram of a training device for long tail recognition models provided by an embodiment of the present application;

fig. 6 is a schematic block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which are derived by a person skilled in the art based on the embodiments of the application, fall within the scope of protection of the application.

The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be understood that the term "and/or" as used herein is merely one relationship describing the association of the associated objects, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.

Depending on the context, the word "if" as used herein may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if determined" or "if detected (stated condition or event)" may be interpreted as "when determined" or "in response to determination" or "when detected (stated condition or event)" or "in response to detection (stated condition or event), depending on the context.

With the development of artificial intelligence technology, there are some answer generation methods that are currently available, most of which focus on learning and optimizing frequently occurring samples during model training because they are easier to obtain and more representative. However, in practice, a user may enter a variety of text, where a portion of the text may refer to knowledge that falls within the category of long-tailed knowledge, which refers to knowledge or information that is relatively unusual or not commonly referred to. For knowledge of long tail type, the frequency of occurrence of related knowledge in the pre-training text of the model is low, and the reply generation model cannot learn the related knowledge corresponding to the sample of long tail type well, so that the accuracy of reply generation still needs to be improved. In order to overcome the problems, in the existing answer generation method, there is also a answer generation method based on retrieval enhancement, related knowledge retrieval is firstly carried out on an external knowledge base aiming at an input text of a user, and then knowledge with higher similarity and the input text are input into an answer model together for reference when the model generates an answer. According to the method, all input texts are searched, and certain input texts are well learned and optimized in the model training process, so that relevant knowledge is obtained, and the searching again is lower in help of improving the answer quality, but the answer generation efficiency is reduced.

Through the observation, the application provides a new idea. To facilitate an understanding of the present application, a system architecture on which the present application is based will first be described. Fig. 1 shows an exemplary system architecture to which embodiments of the present application may be applied, which may include a user equipment and a reply generation device at a server side, as shown in fig. 1.

The user can input text through the user equipment, and the user equipment sends the input text to the answer generating device at the server side.

Wherein the user equipment may include, but is not limited to, such as: intelligent mobile terminals, intelligent home devices, wearable devices, PCs (Personal Computer, personal computers), etc. Wherein the smart mobile device may include, for example, a cell phone, tablet computer, notebook computer, PDA (Personal digital assistant), internet car, etc. Smart home devices may include smart televisions, smart refrigerators, and the like. Wearable devices may include devices such as smart watches, smart glasses, virtual reality devices, augmented reality devices, mixed reality devices (i.e., devices that can support virtual reality and augmented reality), and so forth.

The reply generation means may generate a reply from the input text using the method provided in the embodiment of the present application. Wherein, the question and answer process of the answer generating device can involve the utilization of the generated model. The generative model in fig. 1 is taken as an example of a question-answer model, and can also be applied to other application scenarios, for example, in a translation scenario, the generative model can be a translation model. In an embodiment of the present application, the question-answer model may include a long-tail recognition model, a retrieval model, and a second answer generation model.

The model training device can train the long tail recognition model in the question-answering model in advance by adopting the method provided by the embodiment of the application.

The reply generation device and the long tail recognition model training device can be set as independent servers, can be set in a server group, and can be set in a cloud server. The cloud server is also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of large management difficulty and weak service expansibility in the traditional physical host and Virtual Private Server (VPS) PRIVATE SERVER service. In addition to the architecture shown in fig. 1, the reply generation device and the long tail recognition model may also be provided in a computer terminal having a relatively high computing power.

As one of the realizations, the user may input a question through the user device, which may be in the form of text or voice, and the user device includes the question in a request and sends the request to the answer generating device at the server side. The reply generation means parses the request to obtain an input text (if the request contains a speech form, further speech recognition is performed to obtain the input text) and generates a reply to the input text, generates a response to the request by using the reply (wherein the reply may take the form of text, image, video, etc., or speech, or may take the form of a component embedding the text, image, video, or speech in the user interface), and returns the response to the user device via the network, and the user device presents the reply to the user.

It should be understood that the user equipment, answer generation means, question-answer model and model training means in fig. 1 are only illustrative. There may be any number of user devices, answer generation means, question-answer models, and model training means, as required by the implementation.

Fig. 2 is a flowchart of a reply generation method provided by an embodiment of the present application, which may be performed by the reply generation apparatus in the system shown in fig. 1. As shown in fig. 2, the method may include the steps of:

Step 201: input text is obtained.

Step 202: and long tail recognition is carried out on the input text.

Step 203: and generating a prompt text according to the long tail recognition result, inputting the prompt text into a second answer generation model, and obtaining an answer to the input text, which is generated by the second answer generation model based on the input prompt text.

As can be seen from the above-described flow, in the answer generation method of the present application, long-tail recognition is performed on an input text first, a prompt text for inputting a question-answer model is generated according to the recognition result, and the question-answer model generates an answer according to the prompt text. Compared with the traditional method for directly generating the prompt text according to the input text, the method considers the long tail type characteristics of the input text before generating the answer, and improves the accuracy of the question-answer model.

Each step in the above-described flow is specifically described below.

The above step 201, i.e. "get input text", will be described in detail first with reference to the embodiment.

When a user uses a generative model, such as a large language model, the user typically enters instructions describing the user's needs, which may be in the form of text, voice, or by triggering components on an interface provided by the question-answer model. The generative model generates a response to the instruction input by the user, and since the generative model is usually input by text, if the instruction is in other forms of instruction, the instruction is converted into a text form after being sent to the server side in the request, and the text input by the user or the text converted by the server side is referred to as input text in order to distinguish the answer from the finally generated response.

The step 202, namely, "long tail recognition of input text", is described in detail below in connection with an embodiment.

In the field of natural language processing, long-tail data refers to data that appears less frequently or sparsely because they are distributed in long-tail shapes, i.e., most of the data is in a few documents or categories of the dataset, while most of the remaining documents or categories are relatively rare or unusual. In a language model, knowledge of long tail type is difficult to include in training set data, the difficulty of understanding the knowledge by the model is high, and learning effect is poor.

In view of the above-described features of long-tail data, it is necessary to recognize long-tail input text and then specifically process the long-tail type input text with pertinence. The input text is long tail recognized, and the input text can be classified into a long tail type or a non-long tail type.

The long tail recognition can be performed in various ways, for example, a database is set, the keyword in the input text is searched in the database, and the long tail judgment is performed on the input text according to the number or the occurrence frequency of the search result. For another example, a pre-trained long tail recognition model may be employed to long tail recognize the input text.

As one preferable implementation mode, inputting the input text into a long tail recognition model which is trained in advance, obtaining a long tail probability value predicted by the long tail recognition model, and determining whether the input text is of a long tail type according to the long tail probability value. As shown in FIG. 3a, the model provided by the present application mainly includes a long tail recognition model, a retrieval model, and a second answer generation model.

The long tail recognition model may be implemented based on a machine learning model, and as one of the possible implementations, the long tail recognition model may be implemented using a reply generation model, that is, including a first reply generation model, as shown in fig. 3a, which generates a reply from an input text. The reply is essentially a predicted sequence corresponding to each Token.

If the answer generated by the first answer generation model is in text form, the Token refers to text units, such as characters, words, and the like, and the sequence obtained by the first answer generation model may include Token including a start character, a separator character, an end character, and the like, in addition to the text units Token including characters, words, and the like.

If the answer generated by the first answer generation model is in the form of an image, the Token refers to tiles, which form an image. The sequence obtained by the first answer generation model may include Token including a start symbol, a separator symbol, an end symbol, and the like, in addition to the tile Token.

The long tail probability value may be derived from the probability value of the ending Token of the Token sequence of the first reply generation model predicted reply. The terminator Token is used to represent the end of the text or image, and has the functions of controlling the reply length, evaluating the reply quality, and the like. A predicted probability value of the ending Token may be obtained, and the long tail probability value may be derived using the predicted probability value. If the long tail probability value is greater than or equal to a preset threshold value, the input text can be determined to be of a long tail type, otherwise, the input text is determined to be of a non-long tail type.

The long tail recognition model can be realized by adopting other types of models besides a reply generation model, such as a classification model, input text into a classifier, and long tail recognition is performed according to the probability of a classification result output by the classifier as long tail probability.

The following describes in detail the above step 203, that is, "generating a prompt text according to the long tail recognition result, inputting the prompt text into the second answer generation model, and obtaining an answer to the input text generated by the second answer generation model based on the input prompt text", with reference to the embodiment.

The second answer generation model generates an answer to the input text based on the input prompt text, in which embodiment the content of the prompt text is related to whether the input text is of long tail type.

As one of the possible ways, if the input text is of long tail type, as shown in fig. 3a, the knowledge base may be searched by the search model using the input text to obtain K candidate knowledge, where K is a preset positive integer, and the prompt text is obtained according to the input text and the K candidate knowledge.

The form of each knowledge in the knowledge base may take the form of, but is not limited to: text data forms such as question-answer pairs, triples, documents, etc., structured data forms such as forms of tables, databases or graphs, multimedia data forms such as images, audio or video, etc.

The K candidate knowledge may be K knowledge in the knowledge base, where the similarity between the K candidate knowledge and the input text meets a preset similarity requirement, such as the first K with the largest similarity, and the similarity is greater than or equal to K of the preset similarity threshold.

For the retrieval of the knowledge base, dense retrieval can be adopted, namely, the feature representation of the input text can be obtained by utilizing a pre-training language model, the feature representation of each knowledge in the knowledge base is obtained, the retrieval model utilizes the input text to retrieve the knowledge base, K candidate knowledge is obtained, and the similarity between the feature representations of the K candidate knowledge and the feature representation of the input text meets the preset similarity requirement. Sparse retrieval may also be employed. For example, similarity between an input text and knowledge of each document is calculated based on TF (Term Frequency) -IDF (Inverse Document Frequency ), BM25 algorithm (BM 25 is a classical algorithm for calculating similarity between a query and a document in the technical field of information indexing), and K document knowledge satisfying a preset first similarity requirement is determined as candidate knowledge according to the similarity. In view of the fact that sparse search methods based on TF-IDF, BM25, etc. are currently available, no detailed description is given here.

The prompt text can be obtained by directly splicing the input text and the candidate knowledge, or can be further subjected to operations such as screening and knowledge sequence adjustment, and then spliced the input text and the candidate knowledge. The prompt text may also include other information indicating the characteristics of the input text.

If the input text is not of the long tail type, the prompt text is obtained according to the input text, namely, the external knowledge retrieval is not carried out, and the input text is directly utilized to generate the prompt text. In the embodiment of the application, the input text can be directly used as the prompt text, and the key information extraction of the input text or the result of the operation such as expanding the input text according to the context in the language model can be used as the prompt text.

As shown in fig. 3a, the prompt text is input into a second answer generation model, which generates an answer for the prompt text. The answer may be in text form or in other forms such as image.

In the embodiment of the present application, the first reply generation model and/or the second reply generation model may be based on a large language model LLM (Large Language Model ), or may be a common pre-training language model. The large language model, i.e., LLM, is a deep learning model trained using a large amount of text data, and can generate natural language text or understand meaning of language text. It is characterized by large scale, large parameter volumes (typically above the billion level), and is typically based on deep learning architectures such as the transducer architecture. LLM differs from a common pre-trained language model in the parameter scale, when the parameter scale exceeds a certain level, the model achieves significant performance improvement and exhibits the ability of small models to be absent, such as the ability to learn context learning (in-context learning), to learn complex patterns in language, and to perform a wide range of tasks including text summarization, translation, emotion analysis, multiple rounds of dialog, and so forth. Thus, to distinguish from traditional pre-trained language models, models in which such parameters scale beyond a certain level are referred to as LLM. In general, language models with parameter scales of more than one billion implemented based on deep learning architecture can be considered as large language models. Common LLMs include: GTP-3 (GENERATIVE PRE-trained Transformer 3, version 3 of the generative pre-training converter), T5 (Text-to-Text Transfer Transformer, text-to-Text converter), GTP-4, paLM (a large language model proposed by Google), LLaMA (Large Language Model Meta AI, a large language model published by Meta AI), and so forth.

For LLM, knowledge of non long tail type with high frequency of occurrence can be captured and learned well in training process, but learning effect is poor for knowledge of long tail type. Therefore, long tail recognition processing is introduced into the embodiment of the application, and external knowledge is introduced into the long tail type input text through retrieving a knowledge base to provide LLM as a reference for generating replies to the input text, so that the capability of LLM for generating replies is enhanced, and the accuracy of generating replies is improved.

The application also provides a long tail recognition model training method, and the long tail recognition model obtained by training the method can be used in the long tail recognition process in the reply generation method.

Fig. 3b is a schematic diagram of a training long tail recognition model according to an embodiment of the present application, where, as shown in fig. 3b, training data including a plurality of training samples is obtained, and the training samples include input samples and corresponding reply samples. The training sample data is a collection of input samples and corresponding reply samples, each sample containing one input sample and corresponding reply. In the embodiment of the application, the answer can be in the form of text or other modes such as images. The training samples can be selected from some existing training sample libraries, and can also be extracted from books, articles, forum discussions, social media posts and the like, and the method for acquiring the training samples is not particularly limited. The training samples need to be large enough so that the model can learn the complexity and diversity of the language.

Training a long tail recognition model by using training data, wherein long tail recognition is performed on the input sample by using the long tail recognition model. And generating a prompt text according to the long tail recognition result, inputting the prompt text into a second answer generation model, and acquiring an answer, which is generated by the second answer generation model based on the prompt text, for the input sample.

As one preferred approach, as shown in fig. 3b, the long tail recognition model includes a first reply generation model; inputting an input sample into a first answer generation model, and determining a long tail probability value by utilizing the first answer generation model according to the degree of difference between answers generated by the input sample and answer samples corresponding to the input sample, probability values corresponding to Token in the answers generated by the first answer generation model, average word frequency of the input sample and gradient values corresponding to the input sample; and determining whether the input sample is of a long tail type according to the long tail probability value.

The long tail probability value may be calculated using the value of ECE (Expected Calibration Error ) or based on the word frequency in the input samples. More specifically, the application provides a specific method for determining the long tail probability value, and the application introduces a parameter GECE for evaluating the long tail degree:

Where pred and ref represent the answer generated by the first answer generation model for the input sample and the answer sample corresponding to the input sample, respectively. M (pred, ref) is a met eor score, which is an index for evaluating a text generation task, and the similarity between candidate text and reference text is calculated based on word-level accuracy and recall, and a penalty for word order. In an embodiment of the application, a METEOR score is used to evaluate the differences between the replies and reply samples generated in the first reply generation model. An average of the predicted probabilities for each token in the reply generated by the first reply generation model, where p (t _i) represents the predicted probability of the ith token in the reply generated by the first reply generation model and n is the length of the token sequence. Alpha is the average word frequency of the input samples in the training data, with long tail type input samples having a smaller alpha value. In addition,/>Is the gradient of M (pred, ref) over the current sample,/>Is the average gradient of each input sample in the training data, and the input samples with long tail type have smaller gradient/>, compared with the average gradient of the data setThus/> The dot product result of (2) is also smaller. Thus, a larger GECE value for an input sample means a higher degree of long tail for the input sample.

It should be noted that, the above formula (1) is only one of the expressions GECE provided in the embodiments of the present application, and other formulas obtained by replacing, deforming, etc. the formula within the spirit principles of the formula are also within the scope of the present application.

After GECE values are obtained through calculation according to the formula (1), GECE values can be directly used as long-tail probability values, GECE values can be used as long-tail probability values after normalization and other operations are performed, and long-tail and non-long-tail judgment is performed on input samples according to the long-tail probability values. A long tail threshold value can be set, and when the long tail probability value is greater than or equal to the long tail threshold value, the current input sample is judged to be of a long tail type; and when the long tail probability value is smaller than the long tail threshold value, judging that the current input sample is of a non-long tail type.

The long tail recognition model can be realized by adopting other types of models besides a reply generation model, such as a classification model, input samples are input into a classifier, and long tail recognition is performed according to the probability of a classification result output by the classifier as long tail probability.

As one of the possible ways, if the input sample is of a long tail type, as shown in fig. 3b, the knowledge base may be searched by the search model using the input sample to obtain K candidate knowledge, where K is a preset positive integer, and the prompt text is obtained according to the input sample and the K candidate knowledge. The prompt text can be obtained by directly splicing the input sample with the candidate knowledge, or can be obtained by further performing operations such as screening and knowledge sequence adjustment on the candidate knowledge and then splicing the input sample with the candidate knowledge. The prompt text may also include other information indicating the characteristics of the input sample.

If the input sample is of a non-long tail type, a prompt text is obtained according to the input sample, namely, the external knowledge retrieval is not performed, and the input sample is directly utilized to generate the prompt text. The input sample can be directly used as a prompt text, or the input sample can be used as the prompt text after key information extraction or expansion of the input text.

In the training process of the long tail recognition model described above, training the target may include minimizing a difference between the answer to the input sample and the corresponding answer sample. Constructing a loss function according to the training target, and updating model parameters in a mode such as gradient descent by using the value of the loss function in each round of iteration until a preset training ending condition is met. The training ending condition may include, for example, the value of the loss function being less than or equal to a preset loss function threshold, the number of iterations reaching a preset number of times threshold, etc.

In the training process, the parameters of the second answer generation model may be frozen, and only the long tail recognition model, i.e., the parameters of the first answer generation model, may be updated with the loss function corresponding to the target of training.

Through the learning process, the probability value of the ending Token of the Token sequence of the reply generated by the first reply generation model can be related to whether the corresponding input sample is of a long tail type. If the input sample is of a long tail type, the probability value of the ending symbol Token of the replied Token sequence is larger; if the input sample is of a non-long tail type, the probability value of the ending Token of the replied Token sequence is smaller.

In an embodiment of the present application, the first reply generation model and the second reply generation model may set the same initial parameters, and in the long tail recognition model training process, the parameters of the second reply generation model may be frozen, and the parameters of the long tail recognition model are updated with the loss function corresponding to the trained target. The parameters of the second answer model can also be changed according to the change of the parameters of the first answer model in the long tail recognition model, and the parameters can be flexibly set by a person skilled in the art according to actual requirements and training results.

In an embodiment of the present application, the first answer generation model and the second answer generation model may be based on a large language model LLM (Large Language Model ), or a common pre-training language model. The large language model, i.e., LLM, is a deep learning model trained using a large amount of text data, and can generate natural language text or understand meaning of language text. It is characterized by large scale, large parameter volumes (typically above the billion level), and is typically based on deep learning architectures such as the transducer architecture. LLM differs from a common pre-trained language model in the parameter scale, when the parameter scale exceeds a certain level, the model achieves significant performance improvement and exhibits the ability of small models to be absent, such as the ability to learn context learning (in-context learning), to learn complex patterns in language, and to perform a wide range of tasks including text summarization, translation, emotion analysis, multiple rounds of dialog, and so forth. Thus, to distinguish from traditional pre-trained language models, models in which such parameters scale beyond a certain level are referred to as LLM. In general, language models with parameter scales of more than one billion implemented based on deep learning architecture can be considered as large language models. Common LLMs include: GTP-3 (GENERATIVE PRE-trained Transformer 3, version 3 of the generative pre-training converter), T5 (Text-to-Text Transfer Transformer, text-to-Text converter), GTP-4, paLM (a large language model proposed by Google), LLaMA (Large Language Model Meta AI, a large language model published by Meta AI), and so forth.

The first answer generation model and the second answer generation model can adopt language models with the same scale or different scales. For example, the first reply generation model and the second reply generation model may employ LLMs of the same scale. For another example, the first reply generation model may employ a generic language model and the second reply generation model employs LLM. Etc.

The method provided by the embodiment of the application can be applied to various application scenes, such as intelligent question-answering, text translation, abstract generation and the like. Taking intelligent question and answer as an example, the method can comprise online customer service used by an electronic commerce platform, and the intelligent loudspeaker box has an intelligent question and answer function, and the like.

The user may input the question to the intelligent question-answering system, for example, the intelligent question-answering system provides the user with a page or client through a component such as an input box on the page or client, or the user inputs the question by voice. The intelligent question-answering system at the server side (which can be a cloud server) adopts the answer generation method provided by the embodiment of the application, and answers are obtained by utilizing the questions input by the user. The intelligent question-answering system may be domain specific, such as for medical domains, educational domains, insurance domains, etc.; or a general intelligent question-answering system. Taking an intelligent question-answering system for a specific field as an example, the intelligent question-answering system can train the question-answering model in advance in a mode in the embodiment of the application, and training data can be obtained from a resource library of the required field.

According to an embodiment of another aspect, there is provided a reply generation apparatus. Fig. 4 shows a schematic block diagram of the reply generation device according to one embodiment, which is provided at the server side in the architecture shown in fig. 1. As shown in fig. 4, the apparatus 400 may include a text acquisition unit 401, a long tail recognition unit 402, and a reply generation unit 403, wherein the main functions of the respective constituent units are as follows:

the text acquisition unit 401 is configured to acquire an input text.

The long tail recognition unit 402 is configured to perform long tail recognition on the input text.

The answer generation unit 403 is configured to generate a prompt text according to the long tail recognition result, input the prompt text into the second answer generation model, and obtain an answer to the input text generated by the second answer generation model based on the input prompt text.

As one of the realizations, the long tail recognition unit 402 may be specifically configured to: inputting an input text into a long tail recognition model obtained by training in advance, and obtaining a long tail probability value predicted by the long tail recognition model; and determining whether the input text is of a long tail type according to the long tail probability value.

As one of the realizations, the long tail recognition model includes a first reply generation model; the long tail probability value is derived from the probability value of the ending Token of the Token sequence predicted by the first reply generation model.

As one of the realizations, the reply generation unit 403 may be specifically configured to: if the input text is of a long tail type, searching a knowledge base by using the input text to obtain K candidate knowledge; obtaining a prompt text according to the input text and K candidate knowledge; otherwise, obtaining the prompt text according to the input text.

As one of the realizations, the reply generation unit 403 may be specifically configured to: the second answer generation model is based on the large language model LLM.

According to an embodiment of another aspect, a training device for a long tail recognition model is provided. Fig. 5 shows a schematic block diagram of the reply generation device according to one embodiment, which is provided at the server side in the architecture shown in fig. 1. As shown in fig. 5, the apparatus 500 includes: a sample acquisition unit 501 and a model training unit 502. Wherein the main functions of each constituent unit are as follows:

the sample acquisition unit 501 is configured to acquire training data including a plurality of training samples, the training samples including input samples and their corresponding reply samples.

A model training unit 502 configured to train a long tail recognition model using training data, wherein long tail recognition is performed on an input sample using the long tail recognition model; generating a prompt text according to the long tail recognition result, inputting the prompt text into a second answer generation model, and acquiring an answer, which is generated by the second answer generation model based on the prompt text, aiming at an input sample; the training targets include: the difference between the answer to the input sample and the corresponding answer sample is minimized.

As one of the realizations, the long tail recognition model includes a first reply generation model; the long tail recognition of the input sample using the long tail recognition model includes: inputting the input sample into a first answer generation model; the model training unit 502 may determine a long tail probability value for a degree of difference between a reply generated by the input sample and a reply sample corresponding to the input sample, a probability value corresponding to each Token in the reply generated by the first reply generation model, an average word frequency of the input sample, and a gradient value corresponding to the input sample using the first reply generation model; and determining whether the input sample is of a long tail type according to the long tail probability value.

As one of the realizations, the model training unit 502 may be specifically configured to: if the input sample is of a long tail type, searching a knowledge base by using the input sample to obtain K candidate knowledge; obtaining a prompt text according to the input sample and K candidate knowledge; otherwise, obtaining the prompt text according to the input sample.

As one of the realizations, the model training unit 502 may be specifically configured to: during the training process, the parameters of the second answer generation model are frozen, and the parameters of the long tail recognition model are updated by using the loss function corresponding to the training target.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points. The apparatus embodiments described above are merely illustrative, wherein elements illustrated as separate elements may or may not be physically separate, and elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region, and provide corresponding operation entries for the user to select authorization or rejection.

Based on the method provided by the embodiment of the application, the performance test of the method provided by the application is performed by using a plurality of existing databases and adopting a plurality of language models in the LLM model.

In the performance test, a Question-answer pair in an NQ (Natural Question) dataset and TriviaQA dataset is used as a test sample. NQ is a large-scale question-answer dataset constructed from manually labeled answers on wikipedia web pages, and short types of question-answer data in the NQ dataset are used in the test. TriviaQA is a relatively complex dataset containing syntactic and lexical references between questions and answers.

Baseline models used for performance testing included ilama 2-7B, IRCoT, SKR, SELF-RAG, filoc, and ITER-RETGEN. Llama2-7B is a pre-trained LLM model with large scale parameters that perform well on most benchmarks. IRCoT introduces an interleaving search method, utilizes a concept chain (CoT) to assist in search, and utilizes a search result to support the CoT. SKR uses LLM to distinguish whether a query can be parsed and only obtain knowledge from LLM. SELF-RAG introduces special reflection marks to help the model determine the retrieval requirements and the quality of the retrieved content. FILCO refine the retrieved context by a filter, i.e., training by string containment, lexical overlap relationship matching, and conditional cross mutual information. ITER-RETGEN proposes a language model that is optimized for each other by both generating search enhancements and generating enhanced search.

The present application uses the data in the data set mentioned above as a test sample, and tests the reply generation method given in the embodiment of the present application based on the various baseline language models mentioned above. The application firstly uses a baseline model, a reply generation process is executed by adopting the reply generation method based on long tail recognition in the application aiming at the problem samples in the test samples, and then the process is compared with a reply generation mode of carrying out external knowledge base retrieval on all problems aiming at the same problem samples under the same baseline model. The test result shows that the application has obvious improvement in the aspects of processing speed and accuracy.

First, all baseline models have faster processing speeds when using the method of embodiments of the present application. When the data set NQ is used, the processing speed is respectively improved by 2.1 times, 6.7 times, 5.5 times, 3.3 times, 2.4 times and 7 times in Llama2-7B, IRCoT, SKR, SELF-RAG, FILCO and ITER-RETGEN models; when using dataset TriviaQA, the processing speeds were increased by 2.2-fold, 6.2-fold, 6-fold, 3.5-fold, 2.3-fold and 7.3-fold in the Llama2-7B, IRCoT, SKR, SELF-RAG, FILCO and ITER-RETGEN models, respectively.

Secondly, the method in the embodiment of the application improves the accuracy of reply generation. When using the data set NQ, the ilama 2-7B model had an average Rouge-1 (referring to the character overlap between the generation of the replies and the reply samples) value of 42.1 and an average Bleu-4 (Bleu-4 is an accuracy rating index based on 4-gram) value of 7.30 in the case of retrieval for all questions, whereas when using the method of the present application, the average Rouge-1 value was 42.9 and the average Bleu-4 value was 7.38, i.e., the method of the present application increased the average Rouge-1 value by 0.8 and the average Bleu-4 value by 0.08. Similarly, using the method of the present application, average Rouge-1 values increased by 0.3, 0.4, 2.0, 0.1 and 0.6, respectively, and average Bleu-4 values increased by 0.03, 0.09, 0.28, 0.02 and 0.08, respectively, in the IRCoT, SKR, SELF-RAG, FILCO and ITER-RETGEN models. There is also an increase in the average Rouge-1 value and average Bleu-4 value for the method of the present application when using dataset TriviaQA. Specifically, in the Llama2-7B, IRCoT, SKR, SELF-RAG, FILCO, and ITER-RETGEN models, the average Rouge-1 values increased by 0.8, 0.4, 0.2, 0.5, and 0.4, respectively, and the average Bleu-4 values increased by 0.07, 0.04, 0.02, 0.07, 0.03, and 0.09, respectively.

In addition, the embodiment of the application also provides a computer readable storage medium, on which a computer program is stored, which when being executed by a processor, implements the steps of the method of any one of the previous method embodiments.

And an electronic device comprising:

One or more processors; and

A memory associated with the one or more processors for storing program instructions that, when read for execution by the one or more processors, perform the steps of the method of any of the preceding method embodiments.

The application also provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the method of any of the preceding method embodiments.

And an electronic device comprising:

One or more processors; and

Fig. 6 illustrates an architecture of an electronic device, which may include a processor 610, a video display adapter 611, a disk drive 612, an input/output interface 613, a network interface 614, and a memory 620, to name a few. The processor 610, video display adapter 611, disk drive 612, input/output interface 613, network interface 614, and memory 620 may be communicatively coupled via a communications bus 630.

The processor X10 may be implemented by a general-purpose CPU, a microprocessor, an Application SPECIFIC INTEGRATED Circuit (ASIC), or one or more integrated circuits, etc. for executing related programs to implement the technical solution provided by the present application.

The Memory 620 may be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory ), static storage, dynamic storage, etc. The memory 620 may store an operating system 621 for controlling the operation of the electronic device 600, and a Basic Input Output System (BIOS) 622 for controlling the low-level operation of the electronic device 600. In addition, a web browser 623, a data storage management system 624, a reply generation device 625, and the like may also be stored. The reply generation device 625 may be an application program that specifically implements the operations of the foregoing steps in the embodiment of the present application. In general, when the technical solution provided by the present application is implemented by software or firmware, relevant program codes are stored in the memory 620 and invoked by the processor 610 to be executed.

The input/output interface 613 is used to connect with an input/output module to realize information input and output. The input/output module may be configured as a component in a device (not shown) or may be external to the device to provide corresponding functionality. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various types of sensors, etc., and the output devices may include a display, speaker, vibrator, indicator lights, etc.

The network interface 614 is used to connect communication modules (not shown) to enable communication interactions of the device with other devices. The communication module may implement communication through a wired manner (such as USB, network cable, etc.), or may implement communication through a wireless manner (such as mobile network, WIFI, bluetooth, etc.).

Bus 630 includes a path to transfer information between components of the device (e.g., processor 610, video display adapter 611, disk drive 612, input/output interface 613, network interface 614, and memory 620).

It should be noted that although the above devices illustrate only the processor 610, video display adapter 611, disk drive 612, input/output interface 613, network interface 614, memory 620, bus 630, etc., the device may include other components necessary to achieve proper operation in an implementation. Furthermore, it will be appreciated by those skilled in the art that the apparatus may include only the components necessary to implement the present application, and not all of the components shown in the drawings.

From the above description of embodiments, it will be apparent to those skilled in the art that the present application may be implemented in software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a computer program product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application.

The foregoing has outlined rather broadly the more detailed description of the application in order that the detailed description of the application that follows may be better understood, and in order that the present principles and embodiments may be better understood; also, it is within the scope of the present application to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the application.

Claims

1. A reply generation method, the method comprising:

acquiring an input text;

Long tail recognition is carried out on the input text;

2. The method of claim 1, wherein long tail recognition of the input text comprises:

3. The method according to claim 2, wherein the method further comprises:

the long tail recognition model comprises a first answer generation model;

4. The method of claim 1, wherein the generating a hint text from the result of long tail recognition comprises:

Otherwise, the prompt text is obtained according to the input text.

5. The method of any one of claims 1 to 4, wherein the second answer generation model is derived based on a large language model LLM.

6. A method for training a long tail recognition model, the method comprising:

7. The method of claim 6, wherein the long tail recognition model comprises a first reply generation model;

8. The method of claim 6, wherein generating the hint text based on the long tail recognition result comprises:

Otherwise, obtaining the prompt text according to the input sample.

9. The method according to any one of claims 6 to 8, further comprising:

10. A reply generation method performed by a cloud server, the method comprising:

Long tail recognition is carried out on the input text;

11. A reply generation apparatus, the apparatus comprising:

A text acquisition unit configured to acquire an input text;

12. A training device for long tail recognition models, the device comprising:

13. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method of any of claims 1 to 10.

14. An electronic device, comprising:

One or more processors; and

A memory associated with the one or more processors for storing program instructions that, when read for execution by the one or more processors, perform the steps of the method of any of claims 1 to 10.