CN118070775B

CN118070775B - Performance evaluation method and device of abstract generation model and computer equipment

Info

Publication number: CN118070775B
Application number: CN202410473943.XA
Authority: CN
Inventors: 郭卉
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Filing date: 2024-04-19
Publication date: 2024-06-28
Anticipated expiration: 2044-04-19

Abstract

The disclosure relates to a performance evaluation method, a device and computer equipment of a summary generation model. The method comprises the following steps: obtaining abstract generating models which are respectively obtained by training a plurality of model structures under at least one training data level; training under the same training data magnitude to obtain at least two abstract generating models, and generating at least two abstracts corresponding to the at least two abstract generating models based on the same evaluation text respectively; generating scenario matching labels corresponding to at least two summaries respectively on the basis of the evaluation text and the at least two summaries by comparing the models; determining an evaluation index value of a summary generation model obtained by training under the same training data magnitude based on the scenario matching labels; evaluating the performance of the abstract generating model obtained by training under at least one training data level according to evaluating index values of a plurality of abstract generating models obtained by training under at least one training data level. The method can accurately compare the abstract generating capacities of different models.

Description

Performance evaluation method and device of abstract generation model and computer equipment

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to a performance evaluation method, a device and computer equipment of a summary generation model.

Background

At present, the content efficiency of understanding the scenario is lower by reading the scenario manually. Therefore, the conventional method is to generate the abstract of the scenario based on the abstract model or the large generation model, and automatically understand the abstract. In the process of training the model or selecting the model, the abstract generating capability of the model is evaluated, and the training of the model and the rapid iteration of the model can be accelerated by utilizing the evaluation result.

The abstract generating capability of the traditional evaluation model can be judged manually or by using abstract information quantity indexes. The efficiency of the manual judgment mode is low, and when the abstract information quantity index is adopted for judgment, if the generated abstract semantics are the same, but the expression modes are different, the abstract generation capacity of the model cannot be accurately judged by the mode.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a method, an apparatus, and a computer device for evaluating performance of a digest generation model, which can accurately compare digest generation capabilities of different models.

In a first aspect, the present disclosure provides a performance evaluation method of a summary generation model, where the method includes:

obtaining abstract generating models which are respectively obtained by training a plurality of model structures under at least one training data level, wherein the model structures of at least two abstract generating models obtained by training under the same training data level are different;

Training under the same training data magnitude to obtain at least two abstract generating models, and generating at least two abstracts corresponding to the at least two abstract generating models based on the same evaluation text respectively;

Generating scenario matching labels respectively corresponding to the at least two summaries based on the evaluation text and the at least two summaries through a pre-trained comparison model, wherein the scenario matching labels represent whether the corresponding summaries are matched with scenario contents in the evaluation text;

determining an evaluation index value of each abstract generation model obtained by training under the same training data magnitude based on the scenario matching labels respectively corresponding to the at least two abstracts;

evaluating the performance of the abstract generating model obtained by training under the at least one training data level according to the evaluating index values of the abstract generating models obtained by training under the at least one training data level.

In a second aspect, the present disclosure further provides a performance evaluation device of the summary generation model. The device comprises:

The model acquisition module is used for acquiring abstract generation models which are respectively obtained by training a plurality of model structures under at least one training data level, and model structures of at least two abstract generation models obtained by training under the same training data level are different;

The abstract generation module is used for obtaining at least two abstract generation models through training under the same training data magnitude, and generating at least two abstracts corresponding to the at least two abstract generation models based on the same evaluation text respectively;

The label generation module is used for generating scenario matching labels respectively corresponding to the at least two summaries on the basis of the evaluation text and the at least two summaries through a pre-trained comparison model, and the scenario matching labels represent whether the corresponding summaries are matched with scenario contents in the evaluation text;

the index value determining module is used for determining an evaluation index value of each abstract generation model obtained by training under the same training data magnitude based on the scenario matching labels respectively corresponding to the at least two abstracts;

the evaluation module is used for evaluating the performance of the abstract generation model obtained by training under the at least one training data level according to the evaluation index values of the plurality of abstract generation models obtained by training under the at least one training data level.

In a third aspect, the present disclosure also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the steps of any of the method embodiments described above when the processor executes the computer program.

In a fourth aspect, the present disclosure also provides a computer-readable storage medium. The computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of any of the method embodiments described above.

In a fifth aspect, the present disclosure also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of any of the method embodiments described above.

According to the performance evaluation method, the device, the computer equipment, the storage medium and the computer program product of the abstract generation model, the training data magnitude used in model training can influence the effect of the model, the model structure can influence the effect of the model, and different models under different training data magnitudes can be evaluated by acquiring the abstract generation model which is obtained by training multiple model structures respectively under at least one training data magnitude. At least two abstract generating models are obtained through training under the same training data level, at least two abstracts corresponding to the at least two abstract generating models are generated based on the same evaluating text respectively, the scenario matching labels corresponding to the at least two abstracts respectively are generated based on the evaluating text and the at least two abstracts through a pre-trained comparison model, and the performance difference of different abstract generating models when the same text is processed can be evaluated, so that the matching degree of the generated abstracts of different abstract generating models and the scenario content of the evaluating text under the same evaluating text is determined. Because the plot matching label characterizes whether the corresponding abstract is matched with plot content in the evaluation text, the accuracy of the abstract can be accurately judged under the condition that sentences or words of the abstract are different from words or sentences in the evaluation text by judging whether the abstract is matched with plot content in the evaluation text. According to the scenario matching labels respectively corresponding to the at least two summaries, the evaluation index value of each summary generation model obtained through training under the same training data level is determined, and the matching degree can be indexed, so that the performance of the summary generation model is evaluated better, and the performance of each summary generation model is evaluated more objectively.

Drawings

In order to more clearly illustrate the embodiments of the present disclosure or the prior art, the drawings that are required in the detailed description or the prior art will be briefly described, it will be apparent that the drawings in the following description are some embodiments of the present disclosure, and other drawings may be obtained according to the drawings without inventive effort for a person of ordinary skill in the art.

FIG. 1 is a schematic diagram of an application environment of a performance evaluation method of a summary generation model in one embodiment;

FIG. 2 is a flow diagram of a method for evaluating performance of a summary generation model in one embodiment;

FIG. 3 is a schematic diagram of a performance evaluation method diagram of multiple summary generation models in one embodiment;

FIG. 4 is a flow chart of step S206 in one embodiment;

FIG. 5 is a flow chart of step S208 in one embodiment;

FIG. 6 is a flow chart of a training comparison model in one embodiment;

FIG. 7 is a flowchart of a method for evaluating performance of a summary generation model according to another embodiment;

FIG. 8 is a model effect curve of summary generation model A and summary generation model B in another embodiment;

FIG. 9 is a model effect curve of summary generation model C and summary generation model D in another embodiment;

FIG. 10 is a block diagram schematically illustrating a structure of a performance evaluation apparatus of a digest generation model in one embodiment;

FIG. 11 is a schematic diagram of the internal structure of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present disclosure more apparent, the present disclosure will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present disclosure.

It should be noted that the terms "first," "second," and the like in the description and claims herein and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, apparatus, article, or device that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or device.

In this document, the term "and/or" is merely one association relationship describing the associated object, meaning that three relationships may exist. For example, a and/or B may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.

Artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is the theory, method, technique, and application system that simulates, extends, and extends human intelligence using a digital computer or a machine controlled by a digital computer, perceives the environment, obtains knowledge, and uses the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include, for example, sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, pre-training model technologies, operation/interaction systems, mechatronics, and the like. The pre-training model is also called a large model and a basic model, and can be widely applied to all large-direction downstream tasks of artificial intelligence after fine adjustment. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

Natural language processing (Nature Language processing, NLP) is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. The natural language processing relates to natural language, namely the language used by people in daily life, and is closely researched with linguistics; meanwhile, the computer science and mathematics are important technologies for model training in the artificial intelligence field. The pre-training model was developed from a large language model (Large Language Model) in the NLP domain. Through fine tuning, the large language model can be widely applied to downstream tasks. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.

Machine learning (MACHINE LEARNING, ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, and the like. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like. The pre-training model is the latest development result of deep learning, and integrates the technology.

The Pre-training model (Pre-training model), also called a matrix model and a large model, refers to a deep neural network (Deep neural network, DNN) with large parameters, trains massive unlabeled data, utilizes the function approximation capability of the large-parameter DNN to enable PTM to extract common features on the data, and is suitable for downstream tasks through technologies such as fine tuning, efficient fine tuning (PEFT) and prompt-tuning. Therefore, the pre-training model can achieve ideal effects in a small sample (Few-shot) or Zero sample (Zero-shot) scene. PTM can be classified according to the data modality of processing into a language model (ELMO, BERT, GPT), a visual model (swin-transducer, viT, V-MOE), a speech model (VALL-E), a multi-modal model (ViBERT, CLIP, flamingo, gato), etc., wherein a multi-modal model refers to a model that builds a representation of two or more data modality features. The pre-training model is an important tool for outputting Artificial Intelligence Generation Content (AIGC).

The automatic understanding system of the movie and television drama script can be utilized to quickly understand the script of each script, so that a script auditor can quickly read a script and give script scores and the like to make a decision whether to shoot the script or not, or a director, an actor, shooting staff can arrange shooting scenes, emotion and the like according to the script. The automatic understanding system of the movie and television drama script can automatically output related information such as a main angle, a script abstract, a main angle event, a mood trend, a script of each watching point and the like according to one input movie and television drama script. The automatic understanding system of the movie and television drama script automatically understands the script based on a abstract model or an existing generation type large model, so as to generate an abstract.

In the traditional technology, the quality of abstracts generated by two different models is usually evaluated by adopting a manual comparison mode, the mode can only be qualitative, specific evaluation indexes cannot be given, each evaluation depends on manual evaluation, the time consumption is serious, each time of manual text reading and 1 abstract marking needs 2-5 minutes, and real-time processing cannot be carried out.

In addition, there is a way to generate an evaluation index using a text digest for evaluation. I.e. ROUGE-N, the quality of the generated result is measured according to the matching degree of the generated abstract and the N-element of a certain reference abstract, and the N-gram represents N-elements, i.e. N continuous words. ROUGE-N, where N is referred to as N-gram, when n=1, is called ROUGE-1 (also called Unigrams); when n=2, call ROUGE-2 (also called Bigrams); when n=3, ROUGE-3 is called Trigrams. When a certain N-gram word appears in both the generated abstract and the reference abstract, the word appears correctly, the recall rate is obtained by counting the proportion of the total correctly appearing word (N-gram) and the total reference abstract word (N-gram), and the accuracy rate is obtained by counting the proportion of the total correctly appearing word (N-gram) and the model abstract word (N-gram). And evaluating the quality of the generated abstract by using the accuracy. However, this method lacks evaluation of semantics, and if the generated abstract and the correct abstract of the model are completely identical, but the expression is different (the terms use synonyms or paraphrasing), the accuracy obtained by this method cannot be used as an index of evaluation. In the abstract of the scenario of the movie and television scenario, different people have different expression expressions and styles, so that the characters can be greatly different under the same semantics. Since the abstract models (with a larger number of model parameters for ensuring the text understanding effect) behave differently at different training data levels, and the different models converge differently at different data levels, model a (which may be a model with a relatively smaller parameter size) may perform better when the data amount is small, but model B (which may be a relatively larger size) will overtake model a as the training data increases. This approach does not allow to evaluate the model's ability to generate digests under different data. Therefore, the generated evaluation index cannot accurately evaluate the quality of the generated abstract by using the text abstract.

In order to solve the above problems, as shown in fig. 1, an embodiment of the present disclosure provides a performance evaluation method of a summary generation model, which may be applied to an application environment as shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. In some embodiments, the terminal 102 may obtain abstract generated models that are respectively trained on multiple model structures under at least one training data level, where model structures of at least two abstract generated models that are trained on the same training data level are different. The abstract generation model may be trained in the server 104 or in the terminal 102. The terminal 102 trains to obtain at least two abstract generating models under the same training data magnitude, and generates at least two abstracts corresponding to the at least two abstract generating models based on the same evaluation text respectively. The terminal 102 generates scenario matching labels corresponding to the at least two summaries respectively through a pre-trained comparison model and based on the evaluation text and the at least two summaries, wherein the scenario matching labels represent whether the corresponding summaries are matched with scenario contents in the evaluation text. The pre-trained comparison model may be trained in the server 104 or in the terminal 102. Based on the scenario matching labels corresponding to the at least two summaries respectively, the terminal 102 determines an evaluation index value of each summary generation model obtained by training under the same training data magnitude. The terminal 102 evaluates the abstract generating model trained under the at least one training data level according to the evaluating index values of the plurality of abstract generating models trained under the at least one training data level. The terminal 102 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, smart televisions, vehicle terminals, etc. The server 104 may be implemented as a stand-alone server or a server cluster composed of a plurality of servers, or may be a cloud server.

In one embodiment, as shown in fig. 2, a method for evaluating performance of a summary generation model is provided, where the method is applied to a terminal for illustration, it is understood that the method may also be applied to a server, and may also be applied to a system including the terminal and the server, and implemented through interaction between the terminal and the server. In this embodiment, the method includes the steps of:

S202, obtaining abstract generation models which are respectively obtained by training a plurality of model structures under at least one training data level, wherein the model structures of at least two abstract generation models obtained by training under the same training data level are different.

Wherein the model structure may comprise: the model structure of llama series, q-wen series large models, or other pre-trained models, is not limiting as to the model structure involved in some embodiments of the present disclosure. The training data magnitude may be obtained by grading the data amount of the abstract training data. For example, the data amount of the abstract training data may be classified according to the number of the exponential sequence distribution, and the exponent X may be 1 to 10, thereby obtaining the data amount classification table of the abstract training data as shown in table 1. The data size classification table in table 1 shows different data sizes of corresponding data sizes when the index X is 1 to 10.

TABLE 1 data size grading Table

In addition, according to the circumstances, the value range of the index may be larger, for example, 1 to 25 may be taken when the number is distributed in the index sequence. The value of the index may also take on an odd number, such as 1, 3, 5, 7, etc. The person skilled in the art may select other manners of classifying the data amount of the abstract training data, and may also classify the data amount of the abstract training data in a multiple manner, for example, 1 time, 5 times, 7 times, 10 times, etc. the 1000 data amounts. It should be noted that in some embodiments of the present disclosure, there is no limitation on how the data amount of the summary training data is graded. The abstract generation model may be a model for automatically extracting key information from a long text and generating a concise and accurate abstract. The abstract generation model may be trained using the abstract training data mentioned above, the abstract training data comprising: a piece of text (e.g., a television script) corresponds to a piece of abstract.

Specifically, the terminal may obtain a summary generation model obtained by training the multiple model structures respectively under at least one training data level. The abstract generation model can be trained in the terminal in advance, or can be trained in other terminals or servers. Because the abstract generating models are required to be evaluated, the model structures of at least two abstract generating models obtained by training under the same training data level are different, so that the abstract generating capacity of at least two different abstract generating models can be evaluated under the same training data level.

In some exemplary embodiments, in the process of training the abstract generation model, a piece of text is acquired first, where the text may be a text of a certain episode of a television series, or may be other text. A large model, such as the GPT (GENERATIVE PRE-Trained) model, may be used to extract a digest of the text, which may be manually inspected for corrections after it is extracted, as a model illusion (e.g., a fact error) may occur during processing of the large model. Alternatively, the abstract may be extracted from the text using other models or manual extraction. In some embodiments of the present disclosure, there is no limitation on how the summary of text is extracted. Through the method, the abstract training text (comprising the text and the abstract corresponding to the text) under at least one training data level is obtained, and then different model structures are trained by using the abstract training text under at least one training data level, so that a plurality of different abstract generating models obtained by training under at least one training data level are obtained.

The method comprises the following steps of describing a training process of a summary generation model in the scheme by using a specific application scene, if a certain scenario is acquired, extracting the summary of the scenario by using the scenario and a GPT model, and acquiring the scenario in the scenario by using a GPT model to ask a question, wherein the question sentence is "the following scenario of a certain theatre of a theatre is extracted, the scenario summary of the theatre is required to describe the specific things happening of the theatre, guessing is not required, abstract description is not required, the scenario is xxx", and xxx refers to the content of the scenario. And outputting the abstract of the script through a GPT model, and then detecting and correcting the abstract manually. Finally, a summary is obtained. The abstract and scenario obtained above are used to train llama B model, llama B model (only these two model structures are used here as examples), so as to obtain the abstract generation model of two different model structures.

S204, training under the same training data magnitude to obtain at least two abstract generating models, and generating at least two abstracts corresponding to the at least two abstract generating models based on the same evaluation text respectively.

The evaluation text can be a text of a certain episode of a television series or other texts.

Specifically, since the digest generation model needs to be evaluated, in order to ensure the evaluation effect, the remaining variables need to be controlled to be the same. And generating at least two summaries corresponding to the at least two summary generation models based on the same evaluation text respectively through at least two summary generation models obtained through training under the same training data magnitude. The training data orders of the abstract generating models are the same, and the used evaluation texts are the same, so that the abstract generated by the abstract generating models is judged subsequently, and the capability of at least two abstract generating models to generate the abstract can be judged under the same condition.

S206, generating scenario matching labels respectively corresponding to the at least two summaries based on the evaluation text and the at least two summaries through a pre-trained comparison model, wherein the scenario matching labels represent whether the corresponding summaries are matched with scenario contents in the evaluation text.

The comparison model can be a model for determining whether the abstract is accurate or not by using evaluation text. The comparison model refers to a model for judging whether the text A contains the semantics of the text B or the scenario content, and is in a semantic comparison model form, namely, the judgment result is output through the relation (the former contains or does not contain) of the semantics of the comparison A\B or the scenario content. The comparison model is obtained by training based on a model training text, scenario description information describing scenario contents in the model training text and a matching training label. And the matching training label characterizes that the scenario description information is matched with the model training text. The scenario-matching tags may include summary-matching tags and summary-unmatched tags. The abstract matching tag characterizes that the corresponding abstract is matched with the scenario content in the evaluation text. The abstract unmatched label characterizes that the corresponding abstract is unmatched with the scenario content in the evaluation text.

Specifically, the evaluation text and the at least two abstract models can be respectively input into a pre-trained comparison model, and scenario matching labels respectively corresponding to the at least two abstracts are output through the comparison model. And determining whether at least two summaries match with the scenario content in the evaluation text by using the scenario matching labels.

In some exemplary embodiments, when two abstract generating models obtained by training under the same training data level are respectively an a abstract generating model and a B abstract generating model, the evaluation text is a scenario related to the specific application scenario. The abstract generated by the abstract generating model A is as follows: "xxx photos are found by small a and small B", the digest generated by the B digest generation model is: "small b found an xxx photo. The essence of the scenario content of the evaluation text is as follows: "small b found an xxx photograph" and small a found an xxx photograph is not important. Therefore, the comparison model is used for determining that the summary generated by the summary generating model A is not matched with the scenario content of the evaluation text, and the scenario matching label output by the comparison model can be 0. And determining that the digest generated by the digest generation model B is matched with the scenario content of the evaluation text by using the comparison model, wherein the scenario matching label output by the comparison model can be 1.

S208, determining an evaluation index value of each abstract generation model obtained by training under the same training data magnitude based on the scenario matching labels respectively corresponding to the at least two abstracts.

Wherein the evaluation index value is a value for measuring the digest generation capability generated by the digest generation model. The evaluating the index value may include: accuracy, recall, F1 value, etc.

Specifically, after determining scenario matching labels corresponding to at least two summaries respectively, according to the scenario matching labels, an evaluation index value of each summary generation model obtained through training under the same training data level can be determined. An evaluation index value of each abstract generation model trained under each training data magnitude can also be determined.

In some exemplary embodiments, for example, two abstract generation models obtained by training under the same training data magnitude are respectively an a abstract generation model and a B abstract generation model, the scenario matching label corresponding to the abstract generated by the a abstract generation model is 0 (the generated abstract is not matched with scenario content of the evaluation text), and the evaluation index value of the a abstract generation model is determined to be 0%. And the scenario matching label corresponding to the abstract generated by the abstract generating model B is 1 (the generated abstract is matched with scenario content of the evaluation text), and the evaluation index value of the abstract generating model B is determined to be 100%. It is to be understood that the foregoing is only illustrative.

S210, evaluating the performance of the abstract generating model obtained by training under the at least one training data level according to evaluating index values of the abstract generating models obtained by training under the at least one training data level.

The performance of the abstract generation model may be the capability or effect of the abstract generation model to generate the abstract.

Specifically, the performance of each abstract generation model trained under at least one training data level can be evaluated according to the evaluation index values of a plurality of abstract generation models trained under at least one training data level.

And evaluating the performance of the plurality of abstract generation models obtained by training under each training data level according to the evaluation index values of the plurality of abstract generation models obtained by training under at least one training data level. For example, a summary generation model with the largest evaluation index value for each training data level may be determined.

In some exemplary embodiments, the evaluation index values of the plurality of summary generation models under at least one training data level may be generated into an evaluation chart as shown in fig. 3, where the abscissa of the evaluation chart is the training data level, and the ordinate of the evaluation chart is the evaluation index value of each summary generation model. The evaluation chart can be used for more clearly seeing the variation trend of the performance of each abstract generation model under different training data orders, and the performance of the abstract generation models under the same training data orders can be compared.

In the performance evaluation method of the abstract generation model, the training data magnitude used in model training can influence the effect of the model, the model structure can also influence the effect of the model, and different models under different training data magnitudes can be evaluated by acquiring the abstract generation model which is obtained by training a plurality of model structures under at least one training data magnitude. At least two abstract generating models are obtained through training under the same training data level, at least two abstracts corresponding to the at least two abstract generating models are generated based on the same evaluating text respectively, the scenario matching labels corresponding to the at least two abstracts respectively are generated based on the evaluating text and the at least two abstracts through a pre-trained comparison model, and the performance difference of different abstract generating models when the same text is processed can be evaluated, so that the matching degree of the generated abstracts of different abstract generating models and the scenario content of the evaluating text under the same evaluating text is determined. Because the plot matching label characterizes whether the corresponding abstract is matched with plot content in the evaluation text, the accuracy of the abstract can be accurately judged under the condition that sentences or words of the abstract are different from words or sentences in the evaluation text by judging whether the abstract is matched with plot content in the evaluation text. According to the scenario matching labels respectively corresponding to the at least two summaries, the evaluation index value of each summary generation model obtained through training under the same training data level is determined, and the matching degree can be indexed, so that the performance of the summary generation model is evaluated better, and the performance of each summary generation model is evaluated more objectively.

In one embodiment, as shown in fig. 4, when the digest lengths of the at least two digests are greater than a preset length threshold; the generating scenario matching labels respectively corresponding to the at least two summaries based on the evaluation text and the at least two summaries through the pre-trained comparison model comprises the following steps:

s302, respectively segmenting the at least two abstracts to obtain a plurality of abstract segments corresponding to the at least two abstracts.

Specifically, when the evaluation text is long, the digest generated by the digest generation model is also long. In the process of processing the abstract with longer comparison model, the situation that the comparison model generates scenario matching label errors may occur due to the fact that the content related to the longer comparison model is more, so that the accuracy of comparison model processing is improved. When the digest lengths of at least two digests corresponding to the at least two digest generation models are larger than a preset length threshold, the at least two digests are required to be segmented respectively, and a plurality of digest segments corresponding to the at least two digests respectively are obtained.

In some exemplary embodiments, at least two summaries may be segmented according to sentences corresponding to the at least two summaries, for example, if the summary a contains 5 sentences, the summary a may be segmented according to periods, and each period is one segment, so as to obtain 5 summary segments. The segments can be also carried out according to the variation of the people in the abstract, for example, the first sentence in the abstract A is presented with small a and small b, the second sentence is presented with small a and small b, the third sentence is presented with small a, and the fourth sentence is presented with small a and small c; and combining 1 sentence and 2 sentences into 1 section, 3 sentences into 1 section, and 4 sentences into 1 section to obtain 3 abstract fragments. In addition, one skilled in the art may choose to segment digests having a digest length greater than a preset length threshold in other ways, and in some embodiments of the present disclosure, no limitation is placed on how the digests are segmented.

S304, generating a segment matching label of each abstract segment and the evaluation text based on the evaluation text and a plurality of abstract segments corresponding to the at least two abstracts respectively through a pre-trained comparison model.

The segment matching labels of each abstract segment and the evaluation text can be labels for representing whether the corresponding abstract segment is matched with the scenario content of the evaluation text. The fragment matchability tag may include: and the segment matching labels for representing that the corresponding abstract segments are matched with the scenario content in the evaluation text and the segment unmatched labels for representing that the corresponding abstract segments are unmatched with the scenario content in the evaluation text.

Specifically, a plurality of abstract fragments and evaluation texts corresponding to at least two abstracts are input into a comparison model, and scenario matching labels of each abstract fragment and the evaluation text are generated by using the comparison model.

S306, generating scenario matching labels corresponding to the at least two summaries according to the plurality of summary fragments corresponding to the at least two summaries respectively and the fragment matching labels of each summary fragment and the evaluation text.

Specifically, according to a plurality of abstract fragments corresponding to at least two abstracts and scenario matching labels of each abstract fragment and an evaluation text, fragment matching labels of the plurality of abstract fragments corresponding to at least two abstracts are determined. And generating scenario matching labels corresponding to the at least two abstracts respectively by utilizing the segment matching labels of the plurality of abstract segments corresponding to the at least two abstracts.

In some exemplary embodiments, taking two digests as a digest and B digest, respectively, the a digest is segmented to obtain 3 digest segments as A1, A2, and A3, and the B digest is segmented to obtain 3 digest segments as B1, B2, and B2, respectively. Inputting A1, A2, A3, B1, B2 and B3 and the evaluation text into a comparison model respectively, outputting a segment matching label (0) of the A1 and the evaluation text, outputting a segment matching label (1) of the A2 and the evaluation text, outputting a segment matching label (1) of the A3 and the evaluation text, outputting a segment matching label (1) of the B1 and the evaluation text, outputting a segment matching label (1) of the B2 and the evaluation text, and outputting a segment matching label (0) of the B3 and the evaluation text. The segment matching label of A1 corresponding to the final A abstract is 0, the segment matching label of A2 is 1, and the segment matching label of A3 is 1. The scenario matching label of the abstract a is 011. Similarly, the scenario matching label of the B abstract is 110.

In this embodiment, when the digest lengths of at least two digests are greater than a preset length threshold, the digests are segmented, so that the digest segments and the evaluation text are compared, and the quality of the digests can be evaluated more accurately.

In one embodiment, as shown in fig. 5, the evaluation index value includes: accuracy; determining an evaluation index value of each abstract generation model obtained by training under the same training data magnitude based on scenario matching labels respectively corresponding to the at least two abstracts, wherein the evaluation index value comprises the following components:

S402, identifying fragment matching labels in fragment matching labels of the plurality of abstract fragments aiming at a plurality of abstract fragments corresponding to each abstract in the at least two abstracts, wherein the fragment matching labels represent that the corresponding abstract fragments are matched with the scenario content in the evaluation text.

Specifically, after the segment matching labels of each abstract segment and the evaluation text are generated, determining a plurality of abstract segments corresponding to each abstract in at least two abstracts, and determining the segment matching label of each abstract segment. And whether the corresponding abstract segment is matched with the scenario content of the evaluation text is represented by the segment matching label. Therefore, it is necessary to determine a segment matching tag that the segment matching tag characterizes that the corresponding abstract segment matches the scenario content of the evaluation text.

In some exemplary embodiments, for example, the summary segment of the summary a is A1, A2, A3, the segment match tag of A1 is 0, the segment match tag of A2 is 1, the segment match tag of A3 is 0, if "1" is the segment match tag and "0" is the segment not match tag, it may be identified that the segment match tag of A2 in A1, A2, A3 is the segment match tag.

S404, determining the number of the identified fragment matching labels, and determining the digest accuracy of each digest according to the number of the fragment matching labels and the total number of the plurality of digest fragments corresponding to each digest.

Specifically, the number of identified segment-matching tags may be determined, and the digest accuracy of each digest may be determined based on the number of identified segment-matching tags divided by the total number of the plurality of digest segments corresponding to each digest.

In some exemplary embodiments, if the summary segments corresponding to the summary a are A1, A2, and A3, where the segment-matching tags of A2 are segment-matching tags, the number of segment-matching tags is 1, and the final summary accuracy is 1/3=33.33%.

S406, determining the accuracy of each abstract generation model obtained by training under the same training data magnitude according to the abstract accuracy of each abstract.

Specifically, after determining the digest accuracy of each digest, the accuracy of the digest generation model that generated the digest is determined according to the digest accuracy of each digest, thereby determining the accuracy of each digest generation model.

In this embodiment, the accuracy of the abstract of each abstract can be calculated by determining the number of the identified segment matching tags and the total number of the plurality of abstract segments corresponding to each abstract, which is helpful for evaluating the accuracy and integrity of the abstract. According to the digest accuracy of each digest, the accuracy of each digest generation model obtained by training under the same training data magnitude can be determined, and the method is beneficial to evaluating and comparing the performances of different digest generation models.

When the digest lengths of at least two digests are larger than a preset length threshold, it can be determined that the evaluation text is longer, the digest length of the generated digests is longer, and when the evaluation text is longer, the comparison model is difficult to process. In one embodiment, the generating, by the pre-trained comparison model and based on the evaluation text and the at least two summaries, scenario matching labels respectively corresponding to the at least two summaries includes:

And determining an evaluation abstract pre-generated by the evaluation text, and segmenting the evaluation abstract to obtain a plurality of evaluation abstract segments.

Wherein the evaluation summary is typically an exact summary that matches the scenario content of the evaluation text in some embodiments of the present disclosure.

Specifically, the evaluation summary of the evaluation text may be generated using a large language model (GPT model), a relatively accurate summary generation model that trains to completion and generates a summary, or manually. Because the evaluation text is longer, the evaluation abstract is also longer, and the evaluation abstract can be segmented (the segmentation method can be referred to the method mentioned in the above embodiment and is not repeated here), so as to obtain a plurality of evaluation abstract segments.

And generating an evaluation segment matching label of each abstract segment and each evaluation abstract segment based on each evaluation abstract segment and a plurality of abstract segments corresponding to the at least two abstracts respectively through a pre-trained comparison model.

The evaluation segment matching labels represent whether the corresponding abstract segments are matched with the corresponding evaluation abstract segments or not. Evaluating the segment-matching tag includes: the evaluation segment matches the tag and the evaluation segment does not match the tag. The evaluation segment matching tag characterizes that the corresponding abstract segment is matched with the corresponding evaluation abstract segment. The evaluation segment unmatched label characterizes that the corresponding abstract segment is unmatched with the corresponding evaluation abstract segment.

Specifically, each evaluation abstract and a plurality of abstract segments corresponding to at least two abstracts are respectively input into a comparison model, and scenario matching labels of each abstract segment and each evaluation abstract segment are generated by using the comparison model.

And generating scenario matching labels corresponding to the at least two summaries respectively according to the plurality of summary segments corresponding to the at least two summaries respectively and the evaluation segment matching labels of each summary segment and each evaluation summary segment.

Specifically, the evaluation segment matching label corresponding to each of the plurality of summary segments corresponding to the at least two summaries can be determined according to the plurality of summary segments corresponding to the at least two summaries and the evaluation segment matching label of each summary segment corresponding to each evaluation summary segment. And generating scenario matching labels corresponding to the at least two summaries respectively by using the evaluation segment matching labels corresponding to each summary segment in the plurality of summary segments corresponding to the at least two summaries respectively.

In some exemplary embodiments, two summary segments A1 and A2 corresponding to the summary a are taken as an example, and the evaluation summary segments of the evaluation summary are S1 and S2. The A1 and the S1, the A1 and the S2, the A2 and the S1, and the A2 and the S2 can be respectively input into a comparison model, and the evaluation fragment matching labels of the A1 and the S1, the evaluation fragment matching labels of the A1 and the S2, the evaluation fragment matching labels of the A2 and the S1 and the evaluation fragment matching labels of the A2 and the S2 are output by utilizing the comparison model. If the evaluation segment matching labels of A1 and S1 are 1, namely A1 and S1 are matched, and the evaluation segment matching labels of A1 and S2 are 0, namely A1 and S2 are not matched. The evaluation segment matching labels of A2 and S1 are 0, the evaluation segment matching labels of A2 and S2 are 0, and both A2, S1 and S2 are not matched, then the final A1 abstract segment can be determined to be matched with the evaluation segment, the A1 abstract segment can be determined to be not matched with the evaluation segment, and the final A abstract corresponding scenario matching label can be 10. It is to be understood that the foregoing is only illustrative.

In this embodiment, by matching each summary segment with each evaluation summary segment, the accuracy of the summary segments can be evaluated more accurately, and according to the profile matching labels of each summary segment and each evaluation summary segment, scenario matching labels corresponding to at least two summaries respectively are generated, so that the performance of the summary generation model can be evaluated more accurately.

In one embodiment, the evaluating the index value includes: recall rate; determining an evaluation index value of each abstract generation model obtained by training under the same training data magnitude based on scenario matching labels respectively corresponding to the at least two abstracts, wherein the evaluation index value comprises the following components:

For each abstract of the at least two abstracts, determining the number of evaluation segment matching labels in the evaluation segment matching labels of the plurality of abstract segments according to the evaluation segment matching labels of the plurality of abstract segments corresponding to each abstract, wherein the evaluation segment matching labels represent that the corresponding abstract segments are matched with the scenario content in the evaluation abstract segments.

Specifically, the number of the evaluation segment matching tags in the evaluation segment matching tags can be determined according to the evaluation segment matching tags of the plurality of summary segments corresponding to each summary.

And determining the recall rate of each abstract generation model obtained by training under the same training data magnitude according to the number of the evaluation segment matching labels and the total number of the plurality of evaluation abstract segments.

Specifically, the recall of each summary generation model trained on the same training data level may be determined by dividing the number of evaluation segment matching tags by the total number of the plurality of evaluation summary segments.

In some exemplary embodiments, for an evaluation text with a true value abstract, it is assumed that the evaluation abstract of the annotated evaluation text is segmented and then ni_human paragraphs, and a abstract generation model is taken as an example for illustration. After the abstract generation model generates an abstract based on the evaluation text, splitting the abstract into ni paragraphs, respectively comparing the ni paragraphs with the ni_human paragraphs by using the comparison model, and when a paragraph j in a certain ni is matched with a certain paragraph i in the ni_human paragraphs (namely, the evaluation fragment matching label of the paragraph j is an evaluation fragment matching label), determining that the paragraph j is recalled. The number of all recalls in the ni paragraphs is determined, and when the number of recalls is N2, the recall ratio=n2/ni_human. Further, for example, after the summary is segmented into 10 evaluation summary segments, after the summary generation model generates a summary, the summary is split into 12 summary segments, and the 12 summary segments hit 8 out of the 10 evaluation summary segments altogether, so that the recall ratio recall =8/10 of the summary generation model.

In this embodiment, the recall rate is calculated, so that the recognition capability of each abstract generation model obtained by training under the same training data magnitude to the positive examples can be known, that is, how many real positive examples can be correctly recognized by the abstract generation model.

In one embodiment, the evaluating the index value further comprises: accuracy; determining an evaluation index value of each abstract generation model obtained by training under the same training data magnitude based on scenario matching labels respectively corresponding to the at least two abstracts, and further comprising:

And determining the digest accuracy corresponding to each digest according to the number of the evaluation fragment matching labels and the total number of the plurality of digest fragments corresponding to each digest.

And determining the accuracy of each abstract generating model obtained by training under the same training data magnitude according to the abstract accuracy corresponding to each abstract.

Specifically, when there is an evaluation segment matching tag, the present disclosure also provides another way of calculating accuracy, where the number of evaluation segment matching tags may be divided by the total number of a plurality of summary segments corresponding to each summary, determining the summary accuracy corresponding to each summary, then determining the accuracy of generating a summary generating model of each summary according to the summary accuracy corresponding to each summary, and finally determining the accuracy of each summary generating model obtained by training under the same training data level.

In this embodiment, when the evaluation segment matching tag exists, since the summary segment corresponding to the evaluation segment matching tag representation is matched with the scenario content in the evaluation summary segment, and the summary segment corresponding to the segment matching tag representation is matched with the scenario content in the evaluation text, the number calculation accuracy of the evaluation segment matching tag is more accurate compared with the number calculation accuracy of the segment matching tag.

In one embodiment, the evaluating the index value further comprises: synthesizing index values; determining an evaluation index value of each abstract generation model obtained by training under the same training data magnitude based on scenario matching labels respectively corresponding to the at least two abstracts, and further comprising:

and determining the comprehensive index value of each abstract generation model obtained by training under the same training data level according to the recall rate and the accuracy of each abstract generation model obtained by training under the same training data level.

Wherein the composite index value may be an F1 value, and the F1 value is a measure for evaluating the performance of the classification model. It takes into account the accuracy (Precision) and Recall (Recall) of the model and calculates their harmonic mean to obtain a comprehensive evaluation index.

Specifically, the comprehensive index value of each abstract generation model trained under the same training data magnitude can be calculated according to the recall rate and the precision of each abstract generation model trained under the same training data magnitude. The value range of the comprehensive index value is usually between 0 and 1, and the closer the value is to 1, the better the performance of the abstract generation model is.

In some exemplary embodiments, the composite index value may be calculated using the following formula:

Comprehensive index value=2× (precision×recall)/(precision+recall)

In this embodiment, by calculating the comprehensive index value, accuracy of the abstract generating model and recognition capability of the alignment category can be comprehensively considered, so that performance of the abstract generating model can be evaluated more accurately.

In one embodiment, when the length of the evaluation text is shorter, the summary length of the summary generated by the summary generating model based on the evaluation text is also shorter, and for accurately evaluating the summary generating model, various different types of evaluation texts can be utilized to generate the summary. When the evaluation text is of various types, the evaluation index value comprises: accuracy; determining an evaluation index value of each abstract generation model obtained by training under the same training data magnitude based on scenario matching labels respectively corresponding to the at least two abstracts, wherein the evaluation index value comprises the following components:

and determining scenario matching labels corresponding to the summaries respectively according to the summaries generated by each summary generation model based on the evaluation texts of the types.

Specifically, multiple types of evaluation texts can be respectively input into at least two abstract generation models, and multiple abstracts respectively matched with the multiple types of evaluation texts are output by utilizing each abstract generation model. And then inputting a plurality of summaries with the plurality of types of evaluation texts respectively matched with the plurality of types of evaluation texts into a comparison model, and respectively generating scenario matching labels corresponding to the plurality of summaries by using the comparison model.

And identifying the digest matching labels in the scenario matching labels corresponding to the digests respectively based on the scenario matching labels corresponding to the digests respectively, wherein the digest matching labels represent that the corresponding digests are matched with scenario contents in the evaluation text for generating the corresponding digests.

Specifically, after determining scenario matching property tags corresponding to the summaries respectively, identifying the summary matching tags in the scenario matching property tags corresponding to the summaries respectively.

And determining the number of the identified abstract matching labels, and determining the accuracy of each abstract generating model according to the number of the abstract matching labels and the number of a plurality of abstracts generated by each abstract generating model based on multiple types of evaluation texts.

Specifically, the number of summary matching labels is determined according to summary matching labels in scenario matching labels respectively corresponding to the identified summaries, and the accuracy of each summary generating model is determined according to the number of summary matching labels divided by the number of the summaries generated by each summary generating model based on multiple types of evaluation texts.

In this embodiment, the summary generating model may generate a plurality of summaries by using a plurality of types of evaluating text, and determine the accuracy of the summary generating model according to summary matching tags in scenario matching tags corresponding to the summaries, so that the accuracy of the summary generating model can be accurately calculated, and the accuracy of performance evaluation of the summary generating model is improved.

In one embodiment, the evaluation index value includes a recall ratio, and the determining, based on scenario matching labels corresponding to the at least two summaries, an evaluation index value of each summary generation model obtained by training under the same training data level includes:

Determining a plurality of corresponding evaluation summaries pre-generated by the evaluation texts of the plurality of types;

generating scenario matching labels of a plurality of summaries based on the plurality of evaluation summaries and the plurality of summaries through a pre-trained comparison model;

identifying abstract matching labels in the scenario matching labels of the plurality of abstracts, and determining the number of the identified abstract matching labels;

And determining the recall rate of each abstract generation model according to the number of the abstract matching labels and the number of the plurality of evaluation abstracts.

Specifically, the recall ratio is calculated by using the number of the plurality of evaluation digests as a denominator, and the number of the plurality of evaluation digests and the number of digests generated by the digest generation model are the same, so that the recall ratio and the accuracy of the digest generation model calculated by the calculation are the same.

In one embodiment, the comparison model is obtained by training in a manner comprising:

And acquiring a model training text, scenario description information describing scenario contents in the model training text and a matching training label, wherein the matching training label characterizes the scenario description information to be matched with the model training text.

The model training text can be a text of a certain episode of a television series or other texts. Scenario description information describing scenario content in the model training text may be a summary.

Specifically, a model training text, scenario description information describing scenario contents in the model training text, and a matching training tag may be acquired as comparison model training data. In addition, in order to ensure accuracy in training of the comparison model, the comparison model may be trained using several tens of thousands or more pieces of comparison model training data.

And determining a large language model to be trained, and adding a low-rank self-adaptive structure into the large language model to be trained.

The Low-rank adaptive structure may be a Lora (Low-Rankadaptation) structure. The large language model to be trained may be baichuan B model, llama model, gemma, or the like, or a pre-trained model.

Specifically, a large language model to be trained, which needs to be trained, is determined, and then a low-rank adaptive structure is added into the large language model to be trained.

Based on the model training text, scenario description information describing scenario content in the model training text and matching training labels, training the large language model to be trained after adding a low-rank self-adaptive structure, and obtaining a comparison model.

Specifically, a model training text, scenario description information describing scenario contents in the model training text and a matched training label are used as comparison model training data, and a large language model to be trained after a low-rank self-adaptive structure is added is trained. When training a large language model to be trained, firstly, initializing parameters of the large language model to be trained, and then setting learning parameters and learning rate, wherein in the embodiment of the disclosure, the learning parameters can be parameters of a low-rank self-adaptive structure. The low-rank adaptive structure introduces two matrices a and B, which are d x r and r x d, respectively, where r is much smaller (typically less than 100) if the original matrix of the comparison model training data W is d x d. The learning parameter here is r, which may also be referred to as rank. And training the large language model to be trained by using the initialized parameters, the learning rate and the comparison model training data, updating the initialized parameters and the learning parameters, and obtaining the comparison model based on the updated initialized parameters, the learning parameters and the large language model to be trained.

In some exemplary embodiments, parameters of the large language model to be trained may be initialized using parameters of the model that are the same as the model structure of the large language model to be trained that is pre-trained. Other ways of initializing parameters of a large language model to be trained may also be used. The parameters of the low-rank adaptive structure can be initialized by adopting a Gaussian random distribution mode with 0 mean value and 1 variance. The parameters of the low-rank adaptive structure are set as learning parameters. Adopts 0.0001 as the parameter learning rate and Adam as the learner. Adam learner is an optimization algorithm for training neural networks. The self-adaptive learning rate optimization algorithm can automatically adjust the learning rate according to the historical gradient of each parameter, and can show good performance on different parameter updating. Splitting the comparison model training data into a plurality of batches of training data, and finishing one round of (epoch) iterative training by the large language model to be trained by the plurality of batches of training data; the iterative training process of each round compares the model training data once, and calculates the loss value of each round of iterative training until the loss is no longer reduced under a certain round of iteration. In addition, as shown in fig. 6, when the model training data is compared once in each iteration training process, a certain batch of training data is input into the large language model to be trained, and initialized parameters and learning parameters of the large language model to be trained are set. Outputting the result corresponding to the batch of training data through the large language model to be trained. And then calculating a loss value of the large language model to be trained for processing the batch of training data. The loss values are propagated back to the layers and parameters of the large language model to be trained, so that the contribution of each parameter to the loss is calculated, and the parameters are adjusted to optimize the large language model to be trained. When one round of iterative training is completed, when the average loss value of each batch in the round of iterative training is smaller than that of the previous round, continuing training until the loss value is not reduced any more, completing training, and obtaining a comparison model based on the final large language model parameters to be trained and learning parameters obtained after adjustment. In calculating the loss value, the loss value may be calculated using multi-class cross entropy loss. The loss formula is as follows:

Wherein y: the final category, sentence and scenario content match 1, otherwise 0, p: predicting the probability that each sentence matches the scenario content or does not match the scenario content, and bs is batch training data.

In addition, it should be noted that, the iterative training process, learning parameters and various parameters in the model can be adjusted according to actual conditions.

In this embodiment, by adding a low-rank adaptive structure to the large language model to be trained, data of training parameters can be reduced, so that training speed of the model is improved.

In one embodiment, when the scenario matching tag contains a summary unmatched tag, the summary unmatched tag characterizes that the corresponding summary is unmatched with scenario content in the evaluation text; the method further comprises the steps of:

generating unmatched description information corresponding to the abstract unmatched labels based on the evaluation text and the at least two abstracts through a pre-trained comparison model, wherein the unmatched description information characterizes error reasons of unmatched abstracts corresponding to the evaluation text.

Specifically, since the generated scenario matching tags contain the summary unmatched tags, it can be determined that there is a summary that does not match the scenario content in at least two summaries. Therefore, when the comparison model is trained, unmatched description information can be added for training, at the moment, the evaluation text and at least two summaries can be input into the comparison model, and the comparison model can also generate unmatched description information corresponding to the summary unmatched labels. The unmatched descriptive information contains the reason for the digest not matching the evaluation text.

In some exemplary embodiments, for example, taking a summary as an example, the summary generation model generates a summary that is: small a goes to market to see small b and purchases clothes for small b. If the scenario content of the evaluation text is: together, small a and small b go to the mall, and small a purchases clothes for small b. It may be determined that the digest generated by the digest generation model is erroneous and not: "small a goes to mall and sees small b", but: "small a and small b go to mall together", so the unmatched description information generated by the comparison model may be: "small a and small b go to market together".

And determining a digest generation model for generating the digest corresponding to the digest unmatched tag.

Specifically, when there is a digest non-matching tag, it may first be determined to generate a digest corresponding to the digest non-matching tag. And then determining a digest generation model for generating the corresponding digest.

And adjusting a summary generation model for generating the summary corresponding to the summary unmatched label based on the unmatched description information.

Specifically, after the unmatched description information is determined, the digest generation model for generating the digest corresponding to the digest unmatched tag may be optimally adjusted by using the unmatched description information, for example, the digest generation model may be trained by using the unmatched description information and the digest corresponding to the generated digest unmatched tag as negative samples. And the method can also analyze the unmatched descriptive information and generate common features in the abstract corresponding to the abstract unmatched label, and optimally adjust the structure or parameters related to the common features in the abstract generation model.

It should be noted that, only the summary unmatched tag is described herein, and when a fragment unmatched tag exists or an evaluation fragment unmatched tag exists, the summary generation model may be adjusted in the above manner.

In this embodiment, when the scenario matching tag includes a summary unmatched tag, determining a summary generation model for generating a summary corresponding to the summary unmatched tag, generating unmatched description information by using the comparison model, and adjusting the summary generation model by using the unmatched description information, so that the summary generated by the summary generation model is more accurate.

In one embodiment, after the evaluating the summary generation model trained on the at least one training data level, the method further includes:

and acquiring service processing data and determining the data magnitude of the service processing data.

The service processing data can be related text data in certain application scenes, such as a scenario abstract extraction scene and a document abstract extraction scene.

Specifically, when a summary of text related in some application scenes is required to be extracted, service processing data of the application scenes, which need to be extracted, can be obtained, and the data magnitude of the service data can be determined according to the data volume of the service processing data.

And selecting the abstract generating model with the highest evaluating index value under the data level of the service processing data from evaluating index values of a plurality of abstract generating models obtained by training under the at least one training data level based on the data level of the service processing data.

And generating the abstract of the service processing data based on an abstract generation model with highest evaluating index value under the data magnitude of the service processing data.

Specifically, since the evaluation index values of the plurality of abstract generation models trained on at least one training data level have been obtained in the above embodiment. Therefore, according to the data magnitude of the business processing data, determining the evaluation index values of a plurality of abstract generation models obtained through training under the training data magnitude matched with the data magnitude, and selecting the abstract generation model with the highest evaluation index value from the evaluation index values. And inputting the business processing data into a summary generation model with highest evaluation index value under the data level of the business processing data, and outputting the summary of the business processing data through the summary generation model.

In some exemplary embodiments, if the at least one training data level includes: 100 data orders, 1000 data orders, and 10000 data orders. And a plurality of abstract generating models are corresponding to each training data level, and the abstract generating models are respectively a first abstract generating model and a second abstract generating model. If the data magnitude of the service processing data is 1000 data magnitude, and the evaluation index value of the first abstract generation model is 0.6 and the evaluation index value of the second abstract generation model is 0.95 under the 1000 data magnitude, the service processing data can be processed by using the second abstract generation model, and the abstract of the service processing data is generated.

In this embodiment, by determining the data magnitude of the service processing data, and selecting the abstract generating model with the highest evaluation index value under the data magnitude of the service processing data for processing, the accuracy of the abstract of the generated service processing data can be ensured.

And acquiring service training data and determining the data magnitude of the service training data.

Wherein the business training data may generally be data associated with an application scenario for training a summary generation model.

Specifically, due to different application scenarios, for example, a previously trained abstract generation model is used to generate an abstract of a suspense class scenario. The abstract of the bidding text needs to be generated in the current application scene, and the abstract generation model for generating the abstract of the suspicion-type script can generate the abstract of the bidding text inaccurately or in an abstract generation error due to different application scenes. Therefore, in order to ensure the accuracy of generating the abstract by the abstract generating model under different application scenes, the abstract generating model needs to be retrained by using the service training data. And acquiring service training data under a certain application scene, and determining the data magnitude of the service training data.

And selecting the abstract generating model with the highest evaluating index value under the data level of the service training data from evaluating index values of a plurality of abstract generating models obtained by training under the at least one training data level based on the data level of the service training data.

Specifically, since the evaluation index values of the plurality of abstract generation models trained on at least one training data level have been obtained in the above embodiment. Therefore, according to the data magnitude of the business training data, determining the evaluation index values of a plurality of abstract generation models obtained through training under the training data magnitude matched with the data magnitude, and selecting the abstract generation model with the highest evaluation index value from the evaluation index values.

And determining a model structure of the abstract generating model with the highest evaluation index value under the data magnitude of the service training data.

Specifically, after the abstract generating model with the highest evaluation index value is selected, the model structure of the abstract generating model can be utilized for subsequent model training because the abstract generating model has better performance under the data level of the service training data. The model structure of the summary generation model needs to be determined.

And performing model training by using the model structure and the service training data to obtain a service processing model.

Specifically, model training is performed by using the determined model structure and service training data, and a service processing model is obtained after model training is completed.

In this embodiment, the previously trained abstract generating model needs to be retrained because the application scenarios are different. Since the summary generation model trained on at least one training data level has been evaluated. The abstract generation model with the highest evaluation index value can be selected according to the data magnitude of the business training data and the evaluation index values of the abstract generation models obtained by training under different training data magnitudes, and the model structure of the abstract generation model with the highest evaluation index value is utilized for model training, so that the processing performance of the business processing model obtained by final training can be ensured.

In one embodiment, the present disclosure further provides another method for evaluating performance of a summary generation model, where the method includes:

As shown in fig. 7, the various model structures are trained using summary model training data at different training data levels, and at each training data level, a plurality of summary generation models are obtained. The model structures of the abstract generation models obtained under the same training data magnitude are different.

And training a large language model by using the model training text, scenario description information describing scenario contents of the model training text and the matched training label (1) as comparison model training data to obtain a comparison model. Wherein 1 represents that the model training text is matched with scenario description information describing scenario contents of the model training text. When the large language model is trained, the lora structure can be newly added in the large language model, so that the data volume of the comparison model training data is reduced, and the training speed is improved.

And inputting the same evaluation text into the plurality of abstract generation models aiming at the plurality of abstract generation models obtained by training under each training data level, and generating the abstract corresponding to each abstract generation model. When the evaluation text is longer, the generated abstract is longer, so as to ensure the accuracy of evaluation. The summaries may be compared in segments. Each digest is split into a plurality of digest fragments. And inputting each abstract segment and the evaluation text into a comparison model, and inputting segment matching labels of whether the scenario contents of each abstract segment and the evaluation text are matched or not by using the comparison model. And determining the final scenario matching labels of the abstracts according to the segment matching labels of the plurality of abstracts in each abstracts. And determining an evaluation index value (accuracy) of a summary generation model for generating the summary according to the final scenario matching labels of the summary. Further, the accuracy of each summary is calculated based on the number of fragment match tags in the fragment match tags for the plurality of summary fragments in each summary divided by the total number of fragment match tags for the plurality of summary fragments. And determining the accuracy of the abstract generating model for generating the corresponding abstract according to the accuracy of each abstract. Based on the operation, determining evaluation index values of a plurality of abstract generation models obtained through training under each training data level. A model effect curve as shown in fig. 8 may be generated, and a plurality of summary generation models obtained by training under each training data level may be evaluated by using the model effect curve.

In a specific application scenario, the corresponding model may be selected for processing according to a model effect curve as shown in fig. 8. In fig. 8, the summary generating model a is an upper broken line, the summary generating model B is a lower broken line, and in the figure, model effects (the 1 st point a1 is 100 sample sizes) are lower in the first 9 data orders (100-8100), and it can be seen that the summary generating model a effects are better than the summary generating model B except the 1 st point on the right of the curve in the figure. Therefore, in the processing process, if the data volume of the service processing data is smaller than 100, the effect of selecting the abstract generating model B is better, and if the data volume of the service processing data is larger than 100, the effect of finally selecting the abstract generating model A is better.

According to fig. 8, the summary generating model a is an upper broken line, the summary generating model B is a shadow part S1 included in a lower broken line, and the incremental performance of the summary generating model a relative to the summary generating model B, namely, the incremental performance, not only evaluates the effect of the summary generating model under the same data, but also evaluates the data utilization efficiency of the summary generating model (that is, the better result can be generated with less data, or the better result can be generated with more data).

In addition, there is a case where when there is a cross point between two digest generation models, as shown in fig. 9, when there is a cross point between model effect curves, when there is a cross point between two models, a curve cross point occurs (with 14000 data level as a cross point) when the two models are compared, as shown in fig. 9 (when the curve of the digest generation model C is at the top and the curve of the digest generation model D is at the bottom initially), then the incremental areas s1 and s2 before and after are calculated according to the cross point segmentation, and s2-s1 obtains the incremental effect (negative number) of the digest generation model C with respect to the digest generation model D, and the application scene needs to be considered when selecting the digest generation model. Although the final evaluation index value of the abstract generation model D is better than that of the abstract generation model C, the evaluation index value of the abstract generation model C is higher before the 14000 data order. The model switching sensitive to the reserved data quantity of the two models in service application is needed, if the data quantity reaches 14000 data magnitude, the model is switched into a summary generation model D to be processed, and the summary generation model C is adopted to be processed before. In addition, the model effect curve of fig. 9 can also determine the data utilization rate difference (the evaluation index rises slowly with the increase of the training data level) of the abstract generating model C, so that the appropriate data amount of the training data can be selected according to the model effect curve for training in the training abstract generating model C.

The above description is only made by taking an application process as an example, and in the process of training the abstract generation model, the model effect curve may be used to select the abstract generation model with a high evaluation index value as the model used in training under a certain training data level.

The application further provides application scenes, and the application scenes apply the performance evaluation method of the abstract generation model. Specifically, the performance evaluation method of the abstract generation model can be further applied to a scene of generating abstracts of various texts by using the abstract generation model (selecting the abstract generation model for processing by using an evaluation index value to generate a more accurate abstract), a scene of training the abstract generation model (for example, optimizing and iterating the abstract generation model by using an evaluation index, determining the relation between the training data magnitude and the evaluation index of the abstract generation model, so that the data of the proper training data magnitude is selected for training, the training effect is improved, and the training speed is improved), and a scene of evaluating the abstract generation model. In addition, the performance evaluation method of the abstract generation model mentioned in some embodiments of the present disclosure is not limited to processing the abstract generation model, but may also process other models.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the disclosure further provides a performance evaluation device of the abstract generation model for realizing the performance evaluation method of the abstract generation model. The implementation scheme of the solution provided by the device is similar to the implementation scheme recorded in the method, so the specific limitation in the embodiments of the performance evaluation device of one or more abstract generation models provided below can be referred to the limitation of the performance evaluation method of the abstract generation model hereinabove, and will not be repeated herein.

In one embodiment, as shown in fig. 10, there is provided a performance evaluation apparatus 500 of a summary generation model, including: a model acquisition module 502, a summary generation module 504, a tag generation module 506, an index value determination module 508, and an evaluation module 510, wherein:

The model obtaining module 502 is configured to obtain abstract generating models obtained by training multiple model structures respectively under at least one training data level, where model structures of at least two abstract generating models obtained by training under the same training data level are different;

The abstract generating module 504 is configured to obtain at least two abstract generating models through training under the same training data level, and generate at least two abstracts corresponding to the at least two abstract generating models based on the same evaluation text respectively;

The tag generation module 506 is configured to generate scenario matching tags corresponding to the at least two summaries respectively through a pre-trained comparison model and based on the evaluation text and the at least two summaries, where the scenario matching tags characterize whether the corresponding summaries match with scenario contents in the evaluation text;

the index value determining module 508 is configured to determine an evaluation index value of each abstract generating model obtained by training under the same training data level based on scenario matching labels corresponding to the at least two abstracts respectively;

And the evaluation module 510 is configured to evaluate the performance of the summary generating model obtained by training under the at least one training data level according to the evaluation index values of the plurality of summary generating models obtained by training under the at least one training data level.

In this embodiment, since the magnitude of training data used during model training may affect the effect of the model, the model structure may also affect the effect of the model, and by obtaining the abstract generation model obtained by training multiple model structures respectively under at least one magnitude of training data, different models under different magnitudes of training data may be evaluated. At least two abstract generating models are obtained through training under the same training data level, at least two abstracts corresponding to the at least two abstract generating models are generated based on the same evaluating text respectively, the scenario matching labels corresponding to the at least two abstracts respectively are generated based on the evaluating text and the at least two abstracts through a pre-trained comparison model, and the performance difference of different abstract generating models when the same text is processed can be evaluated, so that the matching degree of the generated abstracts of different abstract generating models and the scenario content of the evaluating text under the same evaluating text is determined. Because the plot matching label characterizes whether the corresponding abstract is matched with plot content in the evaluation text, the accuracy of the abstract can be accurately judged under the condition that sentences or words of the abstract are different from words or sentences in the evaluation text by judging whether the abstract is matched with plot content in the evaluation text. According to the scenario matching labels respectively corresponding to the at least two summaries, the evaluation index value of each summary generation model obtained through training under the same training data level is determined, and the matching degree can be indexed, so that the performance of the summary generation model is evaluated better, and the performance of each summary generation model is evaluated more objectively.

In one embodiment of the apparatus, when the digest lengths of the at least two digests are greater than a preset length threshold; the tag generation module 506 includes:

the abstract segmentation module is used for respectively segmenting the at least two abstracts to obtain a plurality of abstract fragments corresponding to the at least two abstracts respectively;

The abstract segment processing module is used for generating segment matching labels of each abstract segment and the evaluation text based on the evaluation text and a plurality of abstract segments corresponding to the at least two abstracts respectively through a pre-trained comparison model;

And the scenario label generation module is used for generating scenario matching labels corresponding to the at least two abstracts according to the plurality of abstract fragments corresponding to the at least two abstracts respectively and the fragment matching labels of each abstract fragment and the evaluation text.

In an embodiment of the apparatus, the evaluating an index value further comprises: accuracy; the index value determining module 508 includes: the first accuracy determining module is used for identifying fragment matching labels in fragment matching labels of the plurality of abstract fragments aiming at a plurality of abstract fragments corresponding to each abstract in the at least two abstracts, wherein the fragment matching labels represent that the corresponding abstract fragments are matched with the scenario content in the evaluation text; determining the number of the identified fragment matching tags, and determining the digest accuracy of each digest according to the number of the fragment matching tags and the total number of the plurality of digest fragments corresponding to each digest; and determining the accuracy of each abstract generating model obtained by training under the same training data level according to the abstract accuracy of each abstract.

In one embodiment of the apparatus, the tag generation module 506 includes:

And the evaluation abstract segmentation module is used for determining an evaluation abstract pre-generated by the evaluation text and segmenting the evaluation abstract to obtain a plurality of evaluation abstract segments.

The segment comparison module is used for generating an evaluation segment matching label of each abstract segment and each evaluation abstract segment based on each evaluation abstract segment and a plurality of abstract segments corresponding to the at least two abstracts respectively through a pre-trained comparison model.

The scenario label generating module is further configured to generate scenario matching labels corresponding to at least two summaries according to a plurality of summary segments corresponding to the at least two summaries respectively and evaluation segment matching labels of each summary segment and each evaluation summary segment.

In an embodiment of the apparatus, the evaluating an index value further comprises: recall rate; the index value determining module 508 includes: the recall rate determining module is used for determining the number of evaluation fragment matching tags in the evaluation fragment matching tags of the plurality of abstract fragments according to the evaluation fragment matching tags of the plurality of abstract fragments corresponding to each abstract respectively, and the evaluation fragment matching tags represent that the corresponding abstract fragments are matched with the scenario content in the evaluation abstract fragments; and determining the recall rate of each abstract generation model obtained by training under the same training data magnitude according to the number of the evaluation segment matching labels and the total number of the plurality of evaluation abstract segments.

In an embodiment of the apparatus, the evaluating an index value further comprises: accuracy; the index value determining module 508 includes: the second accuracy calculation module is used for determining the digest accuracy corresponding to each digest according to the number of the evaluation fragment matching labels and the total number of the plurality of digest fragments corresponding to each digest; and determining the accuracy of each abstract generating model obtained by training under the same training data magnitude according to the abstract accuracy corresponding to each abstract.

In an embodiment of the apparatus, the evaluating an index value further comprises: synthesizing index values; the index value determining module 508 further includes: and the comprehensive index value determining module is used for determining the comprehensive index value of each abstract generating model obtained by training under the same training data level according to the recall rate and the accuracy of each abstract generating model obtained by training under the same training data level.

In one embodiment of the apparatus, when the evaluation text is of a plurality of types, the evaluation index value includes: accuracy; the index value determining module 508 is further configured to determine, for each summary generating model, a scenario matching label corresponding to each of the summaries generated based on multiple types of evaluation texts; based on scenario matching labels corresponding to the summaries respectively, identifying summary matching labels in scenario matching labels corresponding to the summaries respectively, wherein the summary matching labels represent that the corresponding summaries are matched with scenario contents in evaluation texts for generating the corresponding summaries; and determining the number of the identified abstract matching labels, and determining the accuracy of each abstract generating model according to the number of the abstract matching labels and the number of a plurality of abstracts generated by each abstract generating model based on multiple types of evaluation texts.

In one embodiment of the apparatus, the apparatus further comprises: the system comprises a comparison model obtaining module, a model training text obtaining module and a matching training label, wherein the comparison model obtaining module is used for obtaining a model training text, scenario description information describing scenario contents in the model training text and the matching training label, and the matching training label represents that the scenario description information is matched with the model training text; determining a large language model to be trained, and adding a low-rank self-adaptive structure into the large language model to be trained; based on the model training text, scenario description information describing scenario content in the model training text and matching training labels, training the large language model to be trained after adding a low-rank self-adaptive structure, and obtaining a comparison model.

In one embodiment of the apparatus, when the scenario matching tag contains a summary unmatched tag, the summary unmatched tag characterizes that the corresponding summary is unmatched with scenario content in the evaluation text; the apparatus further comprises: the abstract generation model adjustment module is used for generating unmatched description information corresponding to the abstract unmatched labels through a pre-trained comparison model and based on the evaluation text and the at least two abstracts, and the unmatched description information characterizes error reasons of unmatched abstracts corresponding to the evaluation text; determining a digest generation model for generating a digest corresponding to the digest unmatched tag; and adjusting a summary generation model for generating the summary corresponding to the summary unmatched label based on the unmatched description information.

In one embodiment of the apparatus, the apparatus further comprises: the summary generating module 504 is further configured to obtain service processing data, and determine a data magnitude of the service processing data; based on the data magnitude of the service processing data, selecting a summary generation model with the highest evaluating index value under the data magnitude of the service processing data from evaluating index values of a plurality of summary generation models obtained by training under the at least one training data magnitude; and generating the abstract of the service processing data based on an abstract generation model with highest evaluating index value under the data magnitude of the service processing data.

In one embodiment of the apparatus, the apparatus further comprises: the business processing model training module is used for acquiring business training data and determining the data magnitude of the business training data; based on the data magnitude of the service training data, selecting a summary generation model with the highest evaluating index value under the data magnitude of the service training data from evaluating index values of a plurality of summary generation models obtained by training under the at least one training data magnitude; determining a model structure of the abstract generating model with the highest evaluation index value under the data magnitude of the business training data; and performing model training by using the model structure and the service training data to obtain a service processing model.

All or part of each module in the performance evaluation device of the abstract generation model can be realized by software, hardware and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server or a terminal, and in an embodiment of the present application, an example of which is described as the computer device is a server, and an internal structure diagram thereof may be shown in fig. 11. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer equipment is used for storing data such as abstract, evaluation text, evaluation index value and the like. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a method for performance evaluation of a summary generation model.

Those skilled in the art will appreciate that the architecture shown in fig. 11 is merely a block diagram of a portion of the architecture relevant to the disclosed aspects and is not limiting of the computer device to which the disclosed aspects apply, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, implements the steps of the method embodiments described above.

In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.

It should be noted that, the evaluation text, the model training text and the scenario description information describing the scenario content in the model training text related to the present application are all information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided by the present disclosure may include at least one of non-volatile and volatile memory, among others. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magneto-resistive random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (PHASE CHANGE Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in various forms such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), etc. The databases referred to in the various embodiments provided by the present disclosure may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors involved in the embodiments provided by the present disclosure may be general-purpose processors, central processing units, graphics processors, digital signal processors, programmable logic, quantum computing-based data processing logic, etc., without limitation thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples have expressed only a few embodiments of the present disclosure, which are described in more detail and detail, but are not to be construed as limiting the scope of the present disclosure. It should be noted that variations and modifications can be made by those skilled in the art without departing from the spirit of the disclosure, which are within the scope of the disclosure. Accordingly, the scope of the present disclosure should be determined from the following claims.

Claims

1. The method for evaluating the performance of the abstract generation model is characterized by comprising the following steps:

Generating scenario matching labels respectively corresponding to the at least two summaries based on the evaluation text and the at least two summaries through a pre-trained comparison model, wherein the scenario matching labels represent whether the corresponding summaries are matched with scenario contents in the evaluation text; the comparison model is obtained by training a large language model to be trained after a low-rank self-adaptive structure is added based on a model training text, scenario description information describing scenario contents in the model training text and a matched training label; the matching training label characterizes that the scenario description information is matched with the model training text;

2. The method of claim 1, wherein when the digest lengths of the at least two digests are greater than a preset length threshold; the generating scenario matching labels respectively corresponding to the at least two summaries based on the evaluation text and the at least two summaries through the pre-trained comparison model comprises the following steps:

Segmenting the at least two abstracts respectively to obtain a plurality of abstract segments corresponding to the at least two abstracts respectively;

Generating a segment matching label of each abstract segment and the evaluation text based on the evaluation text and a plurality of abstract segments corresponding to the at least two abstracts respectively through a pre-trained comparison model;

and generating scenario matching labels corresponding to the at least two abstracts respectively according to the plurality of abstracts corresponding to the at least two abstracts respectively and the segment matching labels of each abstracts and the evaluation text.

3. The method of claim 2, wherein the evaluating an index value comprises: accuracy; determining an evaluation index value of each abstract generation model obtained by training under the same training data magnitude based on scenario matching labels respectively corresponding to the at least two abstracts, wherein the evaluation index value comprises the following components:

Identifying fragment matching tags in fragment matching tags of the plurality of abstract fragments aiming at a plurality of abstract fragments corresponding to each abstract in the at least two abstracts, wherein the fragment matching tags represent that the corresponding abstract fragments are matched with scenario contents in the evaluation text;

Determining the number of the identified fragment matching tags, and determining the digest accuracy of each digest according to the number of the fragment matching tags and the total number of the plurality of digest fragments corresponding to each digest;

and determining the accuracy of each abstract generating model obtained by training under the same training data level according to the abstract accuracy of each abstract.

4. The method of claim 2, wherein the generating scenario-matching tags for the at least two summaries based on the evaluation text and the at least two summaries by the pre-trained comparison model, respectively, comprises:

determining an evaluation abstract pre-generated by the evaluation text, and segmenting the evaluation abstract to obtain a plurality of evaluation abstract segments;

generating an evaluation segment matching label of each abstract segment and each evaluation abstract segment based on each evaluation abstract segment and a plurality of abstract segments corresponding to the at least two abstracts respectively through a pre-trained comparison model;

5. The method of claim 4, wherein the evaluating an index value comprises: recall rate; determining an evaluation index value of each abstract generation model obtained by training under the same training data magnitude based on scenario matching labels respectively corresponding to the at least two abstracts, wherein the evaluation index value comprises the following components:

For each abstract of the at least two abstracts, determining the number of evaluation segment matching labels in the evaluation segment matching labels of the plurality of abstract segments according to the evaluation segment matching labels of the plurality of abstract segments corresponding to each abstract, wherein the evaluation segment matching labels represent that the corresponding abstract segments are matched with the scenario content in the evaluation abstract segments;

6. The method of claim 5, wherein the evaluating an index value further comprises: accuracy; determining an evaluation index value of each abstract generation model obtained by training under the same training data magnitude based on scenario matching labels respectively corresponding to the at least two abstracts, and further comprising:

Determining the digest accuracy corresponding to each digest according to the number of the evaluation fragment matching labels and the total number of the plurality of digest fragments corresponding to each digest;

7. The method of claim 6, wherein the evaluating an index value further comprises: synthesizing index values; determining an evaluation index value of each abstract generation model obtained by training under the same training data magnitude based on scenario matching labels respectively corresponding to the at least two abstracts, and further comprising:

8. The method of claim 1, wherein when the evaluation text is of a plurality of types, the evaluation index value includes an accuracy; determining an evaluation index value of each abstract generation model obtained by training under the same training data magnitude based on scenario matching labels respectively corresponding to the at least two abstracts, wherein the evaluation index value comprises the following components:

Determining scenario matching labels corresponding to the summaries respectively according to a plurality of summaries generated by each summary generation model based on a plurality of types of evaluation texts;

Based on scenario matching labels corresponding to the summaries respectively, identifying summary matching labels in scenario matching labels corresponding to the summaries respectively, wherein the summary matching labels represent that the corresponding summaries are matched with scenario contents in evaluation texts for generating the corresponding summaries;

9. The method according to any one of claims 1 to 8, wherein the comparison model is obtained by training comprising:

Acquiring a model training text, scenario description information describing scenario contents in the model training text and a matching training label;

Determining a large language model to be trained, and adding a low-rank self-adaptive structure into the large language model to be trained;

10. The method of claim 1, wherein when the scenario-matching tags contain summary-unmatched tags, the summary-unmatched tags characterize that the corresponding summary does not match the scenario content in the evaluation text; the method further comprises the steps of:

Generating unmatched description information corresponding to the abstract unmatched labels based on the evaluation text and the at least two abstracts through a pre-trained comparison model, wherein the unmatched description information characterizes error reasons of unmatched abstracts corresponding to the evaluation text;

determining a digest generation model for generating a digest corresponding to the digest unmatched tag;

11. The method according to any one of claims 1 to 8, wherein after said evaluating the digest generation model trained on the at least one training data level, the method further comprises:

acquiring service processing data and determining the data magnitude of the service processing data;

based on the data magnitude of the service processing data, selecting a summary generation model with the highest evaluating index value under the data magnitude of the service processing data from evaluating index values of a plurality of summary generation models obtained by training under the at least one training data magnitude;

12. The method according to any one of claims 1 to 8, wherein after said evaluating the digest generation model trained on the at least one training data level, the method further comprises:

acquiring service training data and determining the data magnitude of the service training data;

Based on the data magnitude of the service training data, selecting a summary generation model with the highest evaluating index value under the data magnitude of the service training data from evaluating index values of a plurality of summary generation models obtained by training under the at least one training data magnitude;

determining a model structure of the abstract generating model with the highest evaluation index value under the data magnitude of the business training data;

13. A performance evaluation device of a summary generation model, the device comprising:

The label generation module is used for generating scenario matching labels respectively corresponding to the at least two summaries on the basis of the evaluation text and the at least two summaries through a pre-trained comparison model, and the scenario matching labels represent whether the corresponding summaries are matched with scenario contents in the evaluation text; the comparison model is obtained by training a large language model to be trained after a low-rank self-adaptive structure is added based on a model training text, scenario description information describing scenario contents in the model training text and a matched training label; the matching training label characterizes that the scenario description information is matched with the model training text;

14. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 12 when the computer program is executed.

15. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 12.

16. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any one of claims 1 to 12.