CN116663679A

CN116663679A - Language model training method, device, equipment and storage medium

Info

Publication number: CN116663679A
Application number: CN202310914142.8A
Authority: CN
Inventors: 王鹏远; 庞竟成; 陈雄辉; 俞扬
Original assignee: Nanqi Xiance Nanjing High Tech Co ltd
Current assignee: Nanqi Xiance Nanjing High Tech Co ltd
Priority date: 2023-07-25
Filing date: 2023-07-25
Publication date: 2023-08-29

Abstract

The embodiment of the invention discloses a language model training method, a device, equipment and a storage medium, wherein the method comprises the following steps: inputting a preset unlabeled question sample into an answer generation sub-model in the initial language model to obtain a model output answer; evaluating the model output answer based on an answer evaluation sub-model in the initial language model and the language question type of the preset unlabeled question sample to obtain an answer evaluation result; and updating the answer generation sub-model based on the answer evaluation result to obtain a target language model. The technical scheme of the embodiment of the invention solves the problem that the language model trained by the existing non-supervision language model training method is single in applicable scene, can ensure that the trained language model has good applicability in various language type problems, enriches the applicable scene of the language model and improves the universality of the language model.

Description

Language model training method, device, equipment and storage medium

Technical Field

The embodiment of the invention relates to the technical field of natural language processing (Natural Language Processing, NLP) and reinforcement learning, in particular to a language model training method, device, equipment and storage medium.

Background

Language models exhibit impressive performance in many natural language processing tasks, however, often manual labeling of model-trained samples is required during the training process of the language model, while the prior art can enhance the reasoning ability of the initial language model by unlabeled data, for example, by sampling diverse reasoning paths using Self-Consistency (Self-Consistency) methods, selecting the most consistent answer by marginalizing the sampling paths, and for fine-tuning the initial language model. However, the language model trained based on the methods is mainly suitable for processing the reasoning task with the thinking chain, and is suitable for single scene.

Disclosure of Invention

The embodiment of the invention provides a language model training method, a device, equipment and a storage medium, which can ensure that the trained language model has good applicability in various language type problems, enrich the applicable scenes of the language model and improve the universality of the language model.

In a first aspect, an embodiment of the present invention provides a language model training method, including:

inputting a preset unlabeled question sample into an answer generation sub-model in the initial language model to obtain a model output answer;

evaluating the model output answer based on an answer evaluation sub-model in the initial language model and the language question type of the preset unlabeled question sample to obtain an answer evaluation result;

and updating the answer generation sub-model based on the answer evaluation result to obtain a target language model.

In a second aspect, an embodiment of the present invention provides a language model training apparatus, including:

the answer output module is used for inputting a preset unlabeled question sample into an answer generation sub-model in the initial language model to obtain a model output answer;

the answer evaluation module is used for evaluating the answers output by the model based on an answer evaluation sub-model in the initial language model and the language question type of the preset unlabeled question sample to obtain an answer evaluation result;

and the model updating module is used for updating the answer generation sub-model based on the answer evaluation result to obtain a target language model.

In a third aspect, an embodiment of the present invention provides a computer apparatus, including:

one or more processors;

a memory for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the language model training method of any embodiment.

In a fourth aspect, embodiments of the present invention provide a computer-readable storage medium having stored thereon a computer program that, when executed by a processor, implements the language model training method of any embodiment.

According to the technical scheme provided by the embodiment of the invention, the model output answer is obtained by inputting the preset unlabeled question sample into the answer generation sub-model in the initial language model; evaluating the model output answer based on an answer evaluation sub-model in the initial language model and the language question type of the preset unlabeled question sample to obtain an answer evaluation result; and updating the answer generation sub-model based on the answer evaluation result to obtain a target language model. The technical scheme of the embodiment of the invention solves the problem that the language model trained by the existing non-supervision language model training method is single in applicable scene, can ensure that the trained language model has good applicability in various language type problems, enriches the applicable scene of the language model and improves the universality of the language model.

Drawings

FIG. 1 is a flowchart of a language model training method provided by an embodiment of the present invention;

FIG. 2 is a flowchart of yet another language model training method provided by an embodiment of the present invention;

FIG. 3 is a flowchart of yet another language model training method provided by an embodiment of the present invention;

FIG. 4 is a flow chart of a method for training a language model according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a language model training apparatus according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a computer device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Fig. 1 is a flowchart of a language model training method provided by an embodiment of the present invention, where the embodiment of the present invention is applicable to a scenario in which an unsupervised training is performed on a language model, where the method may be performed by a language model training device, and where the device may be implemented by software and/or hardware.

As shown in fig. 1, the language model training method includes the steps of:

s110, inputting a preset unlabeled question sample into an answer generation sub-model in the initial language model to obtain a model output answer.

The preset unlabeled question sample may be a preset unlabeled language question sample. Unlabeled may refer to no answer label corresponding to a linguistic question. The language model is a language abstract mathematical model which is carried out according to language objective facts, and is a corresponding relation. In particular, the language model may include a generative model, an analytical model, and an recognizability model. The language model can be used for solving language problems such as word translation, word abstract and the like; analysis of the properties of the individual elements in the input set may also be performed to clarify the relationships between the elements; it is also possible to determine whether the input element is a number of disordered words or a qualified sentence in language based on some recognition rule.

While the initial language model may be an original language model that has not been pre-trained. Specifically, the initial language model includes an answer generation sub-model and an answer evaluation sub-model. The answer generation sub-model can be used for solving the input language questions and outputting answers corresponding to the language questions. The answer evaluation sub-model can be used for evaluating the answers output by the answer generation sub-model to obtain an evaluation result. Then, the answer generation sub-model can be trained in turn based on the evaluation result output by the answer evaluation sub-model, so that the answer generation sub-model can output a better answer, and the unsupervised training of the initial language model is realized. The model output answer may be an answer corresponding to a preset unlabeled question sample output by the answer generation sub-model.

In an alternative embodiment, the answer generation sub-model and the answer evaluation sub-model may be language models that are identical in structure and identical in parameters. The embodiment of the invention is based on the following phenomenon: for language models, evaluating the generated answers is simpler than generating answers. For example, writing an attractive story can be challenging, and recognizing the generated text is relatively easy. Based on this phenomenon, the embodiment of the invention provides a language model self-improvement method: by setting the two sub-models to the same language model, the initial language model can be made to play the roles of both a student and a teacher at the same time: as students, the initial language model generates answers to unlabeled questions, and as teachers, the initial language model scores the generated answers. Then, the answer generation sub-model in the initial language model is updated by reinforcement learning to optimize the maximization of the evaluation score. Meanwhile, the answer evaluation sub-model is unchanged all the time in the model training process, and an answer evaluation standard can be fixed, so that the language model training is more stable, and the accuracy of the trained language model in outputting an answer is improved.

S120, evaluating the answers output by the model based on the answer evaluation sub-model in the initial language model and the language question type of the preset unlabeled question sample, and obtaining an answer evaluation result.

The language question type may be a question type of a preset unlabeled question sample. There are a variety of types of language questions, and exemplary types of language questions may include: open problems and closed problems. Wherein, the open questions can be language questions without fixed answers such as text translation, text abstract and the like; the closed questions may select questions, blank questions, etc. with fixed answers. Because the language questions do not necessarily have corresponding standard answers, when evaluating the model output answers, the specific evaluation needs to be performed according to the types of the language questions. The answer evaluation result may be a result of whether the model outputs an answer or not to reasonably evaluate.

Specifically, since the open question has no fixed standard answer, when the language question type is the open question, the quality of the model output answer can be evaluated based on the answer evaluation model. The quality of the model output answer is evaluated, including the aspects of simplicity, accuracy and fluency, writing structure, writing style and the like of the model output answer. For example, the quality of the model output answer may be evaluated by performing corresponding scoring according to the quality of the model output answer, for example, scoring may be performed in a range from zero to ten, where a higher score indicates a higher quality of the model output answer.

And the closed type questions have no fixed standard answers, so that when the language question type is the closed type questions, the accuracy of the model output answers can be evaluated based on the answer evaluation model to obtain accuracy evaluation results, and the accuracy evaluation results are used as answer evaluation results.

And S130, updating the answer generation sub-model based on the answer evaluation result to obtain a target language model.

The target language model may be a language model obtained through training. After the answer evaluation result is obtained, the answer generation sub-model can be updated based on the answer evaluation result to obtain the target language model. For example, parameters of the answer generation sub-model may be adjusted according to the answer evaluation result, and when the loss function of the answer generation sub-model converges, the model training process is completed, so as to obtain the target language model.

According to the technical scheme provided by the embodiment of the invention, through carrying out targeted evaluation on answers corresponding to language questions of different question types and training the language model according to the evaluation result, the trained model can be enabled to have good accuracy and quality when outputting the answers to the language questions of different types, so that the applicable scenes of the language model are enriched, and the universality of the language model is improved.

In an alternative embodiment, the answer generation sub-model is trained based on a near-end policy optimization (Proximal Policy Optimization) algorithm to obtain the target language model. The algorithm can have a good training effect in the language model training of the embodiment of the invention.

In addition, the technical scheme of the embodiment of the invention can adopt entropy regularization to prevent premature convergence in the sampling stage of model training. The technical scheme of the embodiment of the invention can also use the Kullback-Leibler (KL, relative entropy) divergence to prevent the initial language model from deviating from the initial pre-training too far.

In an alternative embodiment, training may be based on the Flan-T5-Large model to obtain the target language model.

According to the technical scheme provided by the embodiment of the invention, the model output answer is obtained by inputting the preset unlabeled question sample into the answer generation sub-model in the initial language model; evaluating the model output answer based on the answer evaluation sub-model in the initial language model and the language question type of a preset unlabeled question sample to obtain an answer evaluation result; updating the answer generation sub-model based on the answer evaluation result to obtain the target language model. The technical scheme of the embodiment of the invention solves the problem that the language model trained by the existing non-supervision language model training method is single in applicable scene, can ensure that the trained language model has good applicability in various language type problems, enriches the applicable scene of the language model and improves the universality of the language model.

Fig. 2 is a flowchart of another language model training method provided by the embodiment of the present invention, where the embodiment of the present invention is applicable to a scenario in which an unsupervised training is performed on a language model, and further illustrates, based on the above embodiment, how to evaluate a model output answer based on an answer evaluation sub-model in an initial language model to obtain an answer evaluation result, and how to update an answer generation sub-model based on the answer evaluation result to obtain a target language model. The apparatus may be implemented in software and/or hardware, and integrated into a computer device having application development functionality.

As shown in fig. 2, the language model training method includes the steps of:

s210, inputting a preset unlabeled question sample into an answer generation sub-model in the initial language model to obtain a model output answer.

The preset unlabeled question sample may be a preset unlabeled language question sample. The language model is a language abstract mathematical model which is carried out according to language objective facts, and is a corresponding relation. In particular, the language model may include a generative model, an analytical model, and an recognizability model. The language model can be used for solving language problems such as word translation, word abstract and the like; analysis of the properties of the individual elements in the input set may also be performed to clarify the relationships between the elements; it is also possible to determine whether the input element is a number of disordered words or a qualified sentence in language based on some recognition rule.

And S220, scoring the quality of the answers output by the model based on the answer evaluation sub-model to obtain a quality evaluation score, and taking the quality evaluation score as an answer evaluation result.

Among them, the types of language questions are various, and by way of example, the types of language questions may include: open problems and closed problems. Wherein, the open questions can be language questions without fixed answers such as text translation, text abstract and the like; the closed questions may select questions, blank questions, etc. with fixed answers. Because the language questions do not necessarily have corresponding standard answers, when evaluating the model output answers, the specific evaluation needs to be performed according to the types of the language questions. Specifically, when the language question is an open question, the quality of the model output answer may be evaluated based on the answer evaluation model. The quality of the model output answer is evaluated, including the aspects of simplicity, accuracy and fluency, writing structure, writing style and the like of the model output answer. For example, the quality of the model output answer may be evaluated by performing corresponding scoring according to the quality of the model output answer, for example, scoring may be performed in a range from zero to ten, where a higher score indicates a higher quality of the model output answer.

The quality assessment score may be a score that scores the quality of the model output answer. The answer evaluation result may be a result of whether the model outputs an answer or not to reasonably evaluate. Specifically, when the language question is an open question, the quality evaluation score may be used as an answer evaluation result, where a higher quality evaluation score indicates a stronger rationality of the model to output an answer.

In an alternative embodiment, the model output answer and the corresponding language question may be used as a set of question-answer pairs; and evaluating the question-answer pair based on the answer evaluation sub-model to obtain an answer evaluation result. Illustratively, the answer evaluation result may be expressed by the following formula:

where R represents an answer evaluation result, (q, o) represents a set of question-answer pairs, Φ (M (prompt, q, o)) represents a result of evaluating the question-answer rationality, and when the language question is an open question, a scoring result of the quality of the answer output from the model may be used as the answer evaluation result.

S230, adjusting parameters of the answer generation sub-model according to the answer evaluation result.

After the answer evaluation result is obtained, parameters of the answer generation sub-model can be adjusted according to the answer evaluation result, so that answers with higher rationality can be output, and training of the language model is achieved.

And S240, completing a model training process when the loss function of the answer generation sub-model is converged, and obtaining a target language model.

The target language model may be a language model obtained through training. Specifically, after the parameters of the answer generation model are adjusted according to the answer evaluation result, when the answer generation self-model loss function converges, the answer generation sub-model at this time may be used as the target language model. When the loss function of the answer generation sub-model converges, the error between the answer of the question output by the file generation sub-model and the standard answer is within a reasonable error range, and the answer generation sub-model can be used as a target language model to complete the training process of the language model. By training the answer generation sub-model based on the quality evaluation score in the case that the language question is an open question, the rationality and effectiveness of the trained target language model for open question solution can be improved.

According to the technical scheme provided by the embodiment of the invention, the model output answer is obtained by inputting the preset unlabeled question sample into the answer generation sub-model in the initial language model; under the condition that the language question type is an open language question, scoring the quality of the model output answer based on the answer evaluation sub-model to obtain a quality evaluation score; taking the quality evaluation score as an answer evaluation result; adjusting parameters of the answer generation submodel according to the answer evaluation result; and when the loss function of the answer generation sub-model is converged, completing the model training process to obtain the target language model. The technical scheme of the embodiment of the invention solves the problem that the language model trained by the existing non-supervision language model training method is single in applicable scene, can ensure that the trained language model has good applicability in various language type problems, enriches the applicable scene of the language model and improves the universality of the language model.

Fig. 3 is a flowchart of another language model training method provided by the embodiment of the present invention, where the embodiment of the present invention is applicable to a scenario in which an unsupervised training is performed on a language model, and on the basis of the foregoing embodiment, how to evaluate a model output answer based on an answer evaluation sub-model in an initial language model to obtain an answer evaluation result is further described on the basis of the foregoing embodiment. The apparatus may be implemented in software and/or hardware, and integrated into a computer device having application development functionality.

As shown in fig. 3, the language model training method includes the steps of:

s310, inputting a preset unlabeled question sample into an answer generation sub-model in the initial language model to obtain a model output answer.

S320, evaluating the accuracy of the model output answer based on the answer evaluation sub-model to obtain an accuracy evaluation result, and taking the accuracy evaluation result as the answer evaluation result.

Among them, the types of language questions are various, and by way of example, the types of language questions may include: open problems and closed problems. Wherein, the open questions can be language questions without fixed answers such as text translation, text abstract and the like; the closed questions may select questions, blank questions, etc. with fixed answers. Because the answer of the language question is fixed and unique when the language question is an open question, the accuracy of the model output answer can be evaluated based on the answer evaluation model to obtain an accuracy evaluation result, and the accuracy evaluation result is used as an answer evaluation result.

where R represents an answer evaluation result, (q, o) represents a set of question-answer pairs, Φ (M (prompt, q, o)) represents a result of evaluating the question-answer rationality, and when the language question is an open question, an accuracy evaluation result of the answer output from the model may be used as an answer evaluation result.

S330, adjusting parameters of the answer generation sub-model according to the answer evaluation result.

And S340, completing a model training process when the loss function of the answer generation sub-model is converged, and obtaining a target language model.

The target language model may be a language model obtained through training. Specifically, after the parameters of the answer generation model are adjusted according to the answer evaluation result, when the answer generation self-model loss function converges, the answer generation sub-model at this time may be used as the target language model. When the loss function of the answer generation sub-model converges, the error between the answer of the question output by the file generation sub-model and the standard answer is within a reasonable error range, and the answer generation sub-model can be used as a target language model to complete the training process of the language model. By training the answer generation sub-model based on the accuracy evaluation result under the condition that the language problem is the closed problem, the rationality and the effectiveness of the trained target language model on the closed problem answer can be improved.

Exemplary, fig. 4 is a flowchart of a method for training a language model according to an embodiment of the present invention, where an "initial language model" represents an initial language model, the "initial language model" includes an answer generation sub-model and an answer evaluation sub-model, and a "question-answer" represents an answer corresponding to no tag data. As shown in fig. 4, the method for training the language model includes: inputting a preset unlabeled data question into an initial language model, generating a corresponding question and answer by an answer generation sub-model, and then evaluating the question and answer by an answer evaluation sub-model based on an evaluation prompt to determine a corresponding reward; the evaluation prompt comprises a quality evaluation prompt when the type of the unlabeled data problem is an open problem and an accuracy evaluation prompt when the type of the unlabeled data problem is a closed problem; and then updating the answer generation sub-model according to the rewards so as to improve the performance of the answer generation sub-model and further complete the training process of language model training.

The technical scheme provided by the embodiment of the invention can enable the initial language model to perform self-evaluation, enable the initial language model to deeply understand own performance and find out the improved direction, effectively improve the performance of the initial language model in various text generation tasks, and have expandability potential under different model scales and training data.

According to the technical scheme provided by the embodiment of the invention, the model output answer is obtained by inputting the preset unlabeled question sample into the answer generation sub-model in the initial language model; under the condition that the language question type is a closed language question, evaluating the accuracy of the model output answer based on the answer evaluation sub-model to obtain an accuracy evaluation result; taking the accuracy evaluation result as the answer evaluation result; adjusting parameters of the answer generation sub-model according to the answer evaluation result; and when the loss function of the answer generation sub-model is converged, completing the model training process to obtain the target language model. The technical scheme of the embodiment of the invention solves the problem that the language model trained by the existing non-supervision language model training method is single in applicable scene, can ensure that the trained language model has good applicability in various language type problems, enriches the applicable scene of the language model and improves the universality of the language model.

Fig. 5 is a schematic structural diagram of a language model training device provided by the embodiment of the present invention, where the embodiment of the present invention is applicable to a scenario in which an unsupervised training is performed on a language model, and the device may be implemented by software and/or hardware, and integrated into a computer device with an application development function.

As shown in fig. 5, the language model training apparatus includes: an answer output module 410, an answer evaluation module 420, and a model update module 430.

The answer output module 410 is configured to input a preset unlabeled question sample into an answer generation sub-model in the initial language model to obtain a model output answer; the answer evaluation module 420 is configured to evaluate an answer output by the model based on an answer evaluation sub-model in the initial language model and the language question type of the preset unlabeled question sample, so as to obtain an answer evaluation result; and a model updating module 430, configured to update the answer generation sub-model based on the answer evaluation result, so as to obtain a target language model.

In an alternative embodiment, the answer evaluation module 420 is specifically configured to: under the condition that the language question type is an open language question, scoring the quality of the answer output by the model based on the answer generation sub-model to obtain a quality assessment score; and taking the quality evaluation score as the answer evaluation result.

In an alternative embodiment, the answer evaluation module 420 may be further configured to: under the condition that the language question type is a closed language question, evaluating the accuracy of the model output answer based on the answer evaluation sub-model to obtain an accuracy evaluation result; and taking the accuracy evaluation result as the answer evaluation result.

In an alternative embodiment, the answer evaluation module 420 may be further configured to: taking the model output answer and the corresponding language questions as a group of question-answer pairs; and evaluating the question-answer pair based on the answer evaluation sub-model to obtain the answer evaluation result.

In an alternative embodiment, the model update module 430 is specifically configured to: adjusting parameters of the answer generation sub-model according to the answer evaluation result; and when the loss function of the answer generation sub-model is converged, completing a model training process to obtain the target language model.

In an alternative embodiment, the target language model is a model trained based on the Flan-T5-Large model.

In an alternative embodiment, the model update module 430 is further configured to: training the answer generation sub-model based on a near-end strategy optimization algorithm to obtain the target language model.

The language model training device provided by the embodiment of the invention can execute the language model training method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

Fig. 6 is a schematic structural diagram of a computer device according to an embodiment of the present invention. FIG. 6 illustrates a block diagram of an exemplary computer device 12 suitable for use in implementing embodiments of the present invention. The computer device 12 shown in fig. 6 is merely an example and should not be construed as limiting the functionality and scope of use of embodiments of the present invention. Computer device 12 may be any terminal device with computing capabilities and may be configured in a language model training device.

As shown in FIG. 6, the computer device 12 is in the form of a general purpose computing device. Components of computer device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, a bus 18 that connects the various system components, including the system memory 28 and the processing units 16.

Bus 18 may be one or more of several types of bus structures including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, micro channel architecture (MAC) bus, enhanced ISA bus, video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Computer device 12 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 30 and/or cache memory 32. The computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from or write to non-removable, nonvolatile magnetic media (not shown in FIG. 6, commonly referred to as a "hard disk drive"). Although not shown in fig. 6, a magnetic disk drive for reading from and writing to a removable non-volatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable non-volatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In such cases, each drive may be coupled to bus 18 through one or more data medium interfaces. The system memory 28 may include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of the embodiments of the invention.

A program/utility 40 having a set (at least one) of program modules 42 may be stored in, for example, system memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 42 generally perform the functions and/or methods of the embodiments described herein.

The computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), one or more devices that enable a user to interact with the computer device 12, and/or any devices (e.g., network card, modem, etc.) that enable the computer device 12 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 22. Moreover, computer device 12 may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet, through network adapter 20. As shown, network adapter 20 communicates with other modules of computer device 12 via bus 18. It should be appreciated that although not shown in fig. 6, other hardware and/or software modules may be used in connection with computer device 12, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

The processing unit 16 executes various functional applications and data processing by running programs stored in the system memory 28, for example, implementing a language model training method provided by the present embodiment, the method including:

The present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a language model training method as provided by any embodiment of the present invention, comprising:

The computer storage media of embodiments of the invention may take the form of any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium may be, for example, but not limited to: an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations of the present invention may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

It will be appreciated by those of ordinary skill in the art that the modules or steps of the invention described above may be implemented in a general purpose computing device, they may be centralized on a single computing device, or distributed over a network of computing devices, or they may alternatively be implemented in program code executable by a computer device, such that they are stored in a memory device and executed by the computing device, or they may be separately fabricated as individual integrated circuit modules, or multiple modules or steps within them may be fabricated as a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims

1. A method for training a language model, comprising:

2. The method according to claim 1, wherein the evaluating the model output answer based on the answer evaluation sub-model in the initial language model and the language question type of the preset unlabeled question sample to obtain an answer evaluation result includes:

under the condition that the language question type is an open language question, scoring the quality of the answer output by the model based on the answer evaluation sub-model to obtain a quality evaluation score;

and taking the quality evaluation score as the answer evaluation result.

3. The method according to claim 1, wherein the evaluating the model output answer based on the answer evaluation sub-model in the initial language model and the language question type of the preset unlabeled question sample to obtain an answer evaluation result includes:

under the condition that the language question type is a closed language question, evaluating the accuracy of the model output answer based on the answer evaluation sub-model to obtain an accuracy evaluation result;

and taking the accuracy evaluation result as the answer evaluation result.

4. A method according to claim 2 or 3, wherein evaluating the model output answer to obtain an answer evaluation result comprises:

taking the model output answer and the corresponding language questions as a group of question-answer pairs;

and evaluating the question-answer pair based on the answer evaluation sub-model to obtain the answer evaluation result.

5. The method of claim 1, wherein updating the answer generation sub-model based on the answer evaluation result to obtain a target language model comprises:

adjusting parameters of the answer generation sub-model according to the answer evaluation result;

and when the loss function of the answer generation sub-model is converged, completing a model training process to obtain the target language model.

6. The method of claim 1, wherein the target language model is a model trained based on a Flan-T5-Large model.

7. The method according to claim 1, wherein the method further comprises:

training the answer generation sub-model based on a near-end strategy optimization algorithm to obtain the target language model.

8. A language model training apparatus, comprising:

9. A computer device, the computer device comprising:

one or more processors;

a memory for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the language model training method of any one of claims 1-7.

10. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the language model training method of any one of claims 1-7.