CN116663679A - Language model training method, device, equipment and storage medium - Google Patents

Language model training method, device, equipment and storage medium Download PDF

Info

Publication number
CN116663679A
CN116663679A CN202310914142.8A CN202310914142A CN116663679A CN 116663679 A CN116663679 A CN 116663679A CN 202310914142 A CN202310914142 A CN 202310914142A CN 116663679 A CN116663679 A CN 116663679A
Authority
CN
China
Prior art keywords
model
answer
language
language model
question
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310914142.8A
Other languages
Chinese (zh)
Inventor
王鹏远
庞竟成
陈雄辉
俞扬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanqi Xiance Nanjing High Tech Co ltd
Original Assignee
Nanqi Xiance Nanjing High Tech Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanqi Xiance Nanjing High Tech Co ltd filed Critical Nanqi Xiance Nanjing High Tech Co ltd
Priority to CN202310914142.8A priority Critical patent/CN116663679A/en
Publication of CN116663679A publication Critical patent/CN116663679A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The embodiment of the invention discloses a language model training method, a device, equipment and a storage medium, wherein the method comprises the following steps: inputting a preset unlabeled question sample into an answer generation sub-model in the initial language model to obtain a model output answer; evaluating the model output answer based on an answer evaluation sub-model in the initial language model and the language question type of the preset unlabeled question sample to obtain an answer evaluation result; and updating the answer generation sub-model based on the answer evaluation result to obtain a target language model. The technical scheme of the embodiment of the invention solves the problem that the language model trained by the existing non-supervision language model training method is single in applicable scene, can ensure that the trained language model has good applicability in various language type problems, enriches the applicable scene of the language model and improves the universality of the language model.

Description

Language model training method, device, equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of natural language processing (Natural Language Processing, NLP) and reinforcement learning, in particular to a language model training method, device, equipment and storage medium.
Background
Language models exhibit impressive performance in many natural language processing tasks, however, often manual labeling of model-trained samples is required during the training process of the language model, while the prior art can enhance the reasoning ability of the initial language model by unlabeled data, for example, by sampling diverse reasoning paths using Self-Consistency (Self-Consistency) methods, selecting the most consistent answer by marginalizing the sampling paths, and for fine-tuning the initial language model. However, the language model trained based on the methods is mainly suitable for processing the reasoning task with the thinking chain, and is suitable for single scene.
Disclosure of Invention
The embodiment of the invention provides a language model training method, a device, equipment and a storage medium, which can ensure that the trained language model has good applicability in various language type problems, enrich the applicable scenes of the language model and improve the universality of the language model.
In a first aspect, an embodiment of the present invention provides a language model training method, including:
inputting a preset unlabeled question sample into an answer generation sub-model in the initial language model to obtain a model output answer;
evaluating the model output answer based on an answer evaluation sub-model in the initial language model and the language question type of the preset unlabeled question sample to obtain an answer evaluation result;
and updating the answer generation sub-model based on the answer evaluation result to obtain a target language model.
In a second aspect, an embodiment of the present invention provides a language model training apparatus, including:
the answer output module is used for inputting a preset unlabeled question sample into an answer generation sub-model in the initial language model to obtain a model output answer;
the answer evaluation module is used for evaluating the answers output by the model based on an answer evaluation sub-model in the initial language model and the language question type of the preset unlabeled question sample to obtain an answer evaluation result;
and the model updating module is used for updating the answer generation sub-model based on the answer evaluation result to obtain a target language model.
In a third aspect, an embodiment of the present invention provides a computer apparatus, including:
one or more processors;
a memory for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the language model training method of any embodiment.
In a fourth aspect, embodiments of the present invention provide a computer-readable storage medium having stored thereon a computer program that, when executed by a processor, implements the language model training method of any embodiment.
According to the technical scheme provided by the embodiment of the invention, the model output answer is obtained by inputting the preset unlabeled question sample into the answer generation sub-model in the initial language model; evaluating the model output answer based on an answer evaluation sub-model in the initial language model and the language question type of the preset unlabeled question sample to obtain an answer evaluation result; and updating the answer generation sub-model based on the answer evaluation result to obtain a target language model. The technical scheme of the embodiment of the invention solves the problem that the language model trained by the existing non-supervision language model training method is single in applicable scene, can ensure that the trained language model has good applicability in various language type problems, enriches the applicable scene of the language model and improves the universality of the language model.
Drawings
FIG. 1 is a flowchart of a language model training method provided by an embodiment of the present invention;
FIG. 2 is a flowchart of yet another language model training method provided by an embodiment of the present invention;
FIG. 3 is a flowchart of yet another language model training method provided by an embodiment of the present invention;
FIG. 4 is a flow chart of a method for training a language model according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a language model training apparatus according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Fig. 1 is a flowchart of a language model training method provided by an embodiment of the present invention, where the embodiment of the present invention is applicable to a scenario in which an unsupervised training is performed on a language model, where the method may be performed by a language model training device, and where the device may be implemented by software and/or hardware.
As shown in fig. 1, the language model training method includes the steps of:
s110, inputting a preset unlabeled question sample into an answer generation sub-model in the initial language model to obtain a model output answer.
The preset unlabeled question sample may be a preset unlabeled language question sample. Unlabeled may refer to no answer label corresponding to a linguistic question. The language model is a language abstract mathematical model which is carried out according to language objective facts, and is a corresponding relation. In particular, the language model may include a generative model, an analytical model, and an recognizability model. The language model can be used for solving language problems such as word translation, word abstract and the like; analysis of the properties of the individual elements in the input set may also be performed to clarify the relationships between the elements; it is also possible to determine whether the input element is a number of disordered words or a qualified sentence in language based on some recognition rule.
While the initial language model may be an original language model that has not been pre-trained. Specifically, the initial language model includes an answer generation sub-model and an answer evaluation sub-model. The answer generation sub-model can be used for solving the input language questions and outputting answers corresponding to the language questions. The answer evaluation sub-model can be used for evaluating the answers output by the answer generation sub-model to obtain an evaluation result. Then, the answer generation sub-model can be trained in turn based on the evaluation result output by the answer evaluation sub-model, so that the answer generation sub-model can output a better answer, and the unsupervised training of the initial language model is realized. The model output answer may be an answer corresponding to a preset unlabeled question sample output by the answer generation sub-model.
In an alternative embodiment, the answer generation sub-model and the answer evaluation sub-model may be language models that are identical in structure and identical in parameters. The embodiment of the invention is based on the following phenomenon: for language models, evaluating the generated answers is simpler than generating answers. For example, writing an attractive story can be challenging, and recognizing the generated text is relatively easy. Based on this phenomenon, the embodiment of the invention provides a language model self-improvement method: by setting the two sub-models to the same language model, the initial language model can be made to play the roles of both a student and a teacher at the same time: as students, the initial language model generates answers to unlabeled questions, and as teachers, the initial language model scores the generated answers. Then, the answer generation sub-model in the initial language model is updated by reinforcement learning to optimize the maximization of the evaluation score. Meanwhile, the answer evaluation sub-model is unchanged all the time in the model training process, and an answer evaluation standard can be fixed, so that the language model training is more stable, and the accuracy of the trained language model in outputting an answer is improved.
S120, evaluating the answers output by the model based on the answer evaluation sub-model in the initial language model and the language question type of the preset unlabeled question sample, and obtaining an answer evaluation result.
The language question type may be a question type of a preset unlabeled question sample. There are a variety of types of language questions, and exemplary types of language questions may include: open problems and closed problems. Wherein, the open questions can be language questions without fixed answers such as text translation, text abstract and the like; the closed questions may select questions, blank questions, etc. with fixed answers. Because the language questions do not necessarily have corresponding standard answers, when evaluating the model output answers, the specific evaluation needs to be performed according to the types of the language questions. The answer evaluation result may be a result of whether the model outputs an answer or not to reasonably evaluate.
Specifically, since the open question has no fixed standard answer, when the language question type is the open question, the quality of the model output answer can be evaluated based on the answer evaluation model. The quality of the model output answer is evaluated, including the aspects of simplicity, accuracy and fluency, writing structure, writing style and the like of the model output answer. For example, the quality of the model output answer may be evaluated by performing corresponding scoring according to the quality of the model output answer, for example, scoring may be performed in a range from zero to ten, where a higher score indicates a higher quality of the model output answer.
And the closed type questions have no fixed standard answers, so that when the language question type is the closed type questions, the accuracy of the model output answers can be evaluated based on the answer evaluation model to obtain accuracy evaluation results, and the accuracy evaluation results are used as answer evaluation results.
And S130, updating the answer generation sub-model based on the answer evaluation result to obtain a target language model.
The target language model may be a language model obtained through training. After the answer evaluation result is obtained, the answer generation sub-model can be updated based on the answer evaluation result to obtain the target language model. For example, parameters of the answer generation sub-model may be adjusted according to the answer evaluation result, and when the loss function of the answer generation sub-model converges, the model training process is completed, so as to obtain the target language model.
According to the technical scheme provided by the embodiment of the invention, through carrying out targeted evaluation on answers corresponding to language questions of different question types and training the language model according to the evaluation result, the trained model can be enabled to have good accuracy and quality when outputting the answers to the language questions of different types, so that the applicable scenes of the language model are enriched, and the universality of the language model is improved.
In an alternative embodiment, the answer generation sub-model is trained based on a near-end policy optimization (Proximal Policy Optimization) algorithm to obtain the target language model. The algorithm can have a good training effect in the language model training of the embodiment of the invention.
In addition, the technical scheme of the embodiment of the invention can adopt entropy regularization to prevent premature convergence in the sampling stage of model training. The technical scheme of the embodiment of the invention can also use the Kullback-Leibler (KL, relative entropy) divergence to prevent the initial language model from deviating from the initial pre-training too far.
In an alternative embodiment, training may be based on the Flan-T5-Large model to obtain the target language model.
According to the technical scheme provided by the embodiment of the invention, the model output answer is obtained by inputting the preset unlabeled question sample into the answer generation sub-model in the initial language model; evaluating the model output answer based on the answer evaluation sub-model in the initial language model and the language question type of a preset unlabeled question sample to obtain an answer evaluation result; updating the answer generation sub-model based on the answer evaluation result to obtain the target language model. The technical scheme of the embodiment of the invention solves the problem that the language model trained by the existing non-supervision language model training method is single in applicable scene, can ensure that the trained language model has good applicability in various language type problems, enriches the applicable scene of the language model and improves the universality of the language model.
Fig. 2 is a flowchart of another language model training method provided by the embodiment of the present invention, where the embodiment of the present invention is applicable to a scenario in which an unsupervised training is performed on a language model, and further illustrates, based on the above embodiment, how to evaluate a model output answer based on an answer evaluation sub-model in an initial language model to obtain an answer evaluation result, and how to update an answer generation sub-model based on the answer evaluation result to obtain a target language model. The apparatus may be implemented in software and/or hardware, and integrated into a computer device having application development functionality.
As shown in fig. 2, the language model training method includes the steps of:
s210, inputting a preset unlabeled question sample into an answer generation sub-model in the initial language model to obtain a model output answer.
The preset unlabeled question sample may be a preset unlabeled language question sample. The language model is a language abstract mathematical model which is carried out according to language objective facts, and is a corresponding relation. In particular, the language model may include a generative model, an analytical model, and an recognizability model. The language model can be used for solving language problems such as word translation, word abstract and the like; analysis of the properties of the individual elements in the input set may also be performed to clarify the relationships between the elements; it is also possible to determine whether the input element is a number of disordered words or a qualified sentence in language based on some recognition rule.
While the initial language model may be an original language model that has not been pre-trained. Specifically, the initial language model includes an answer generation sub-model and an answer evaluation sub-model. The answer generation sub-model can be used for solving the input language questions and outputting answers corresponding to the language questions. The answer evaluation sub-model can be used for evaluating the answers output by the answer generation sub-model to obtain an evaluation result. Then, the answer generation sub-model can be trained in turn based on the evaluation result output by the answer evaluation sub-model, so that the answer generation sub-model can output a better answer, and the unsupervised training of the initial language model is realized. The model output answer may be an answer corresponding to a preset unlabeled question sample output by the answer generation sub-model.
And S220, scoring the quality of the answers output by the model based on the answer evaluation sub-model to obtain a quality evaluation score, and taking the quality evaluation score as an answer evaluation result.
Among them, the types of language questions are various, and by way of example, the types of language questions may include: open problems and closed problems. Wherein, the open questions can be language questions without fixed answers such as text translation, text abstract and the like; the closed questions may select questions, blank questions, etc. with fixed answers. Because the language questions do not necessarily have corresponding standard answers, when evaluating the model output answers, the specific evaluation needs to be performed according to the types of the language questions. Specifically, when the language question is an open question, the quality of the model output answer may be evaluated based on the answer evaluation model. The quality of the model output answer is evaluated, including the aspects of simplicity, accuracy and fluency, writing structure, writing style and the like of the model output answer. For example, the quality of the model output answer may be evaluated by performing corresponding scoring according to the quality of the model output answer, for example, scoring may be performed in a range from zero to ten, where a higher score indicates a higher quality of the model output answer.
The quality assessment score may be a score that scores the quality of the model output answer. The answer evaluation result may be a result of whether the model outputs an answer or not to reasonably evaluate. Specifically, when the language question is an open question, the quality evaluation score may be used as an answer evaluation result, where a higher quality evaluation score indicates a stronger rationality of the model to output an answer.
In an alternative embodiment, the model output answer and the corresponding language question may be used as a set of question-answer pairs; and evaluating the question-answer pair based on the answer evaluation sub-model to obtain an answer evaluation result. Illustratively, the answer evaluation result may be expressed by the following formula:
where R represents an answer evaluation result, (q, o) represents a set of question-answer pairs, Φ (M (prompt, q, o)) represents a result of evaluating the question-answer rationality, and when the language question is an open question, a scoring result of the quality of the answer output from the model may be used as the answer evaluation result.
S230, adjusting parameters of the answer generation sub-model according to the answer evaluation result.
After the answer evaluation result is obtained, parameters of the answer generation sub-model can be adjusted according to the answer evaluation result, so that answers with higher rationality can be output, and training of the language model is achieved.
And S240, completing a model training process when the loss function of the answer generation sub-model is converged, and obtaining a target language model.
The target language model may be a language model obtained through training. Specifically, after the parameters of the answer generation model are adjusted according to the answer evaluation result, when the answer generation self-model loss function converges, the answer generation sub-model at this time may be used as the target language model. When the loss function of the answer generation sub-model converges, the error between the answer of the question output by the file generation sub-model and the standard answer is within a reasonable error range, and the answer generation sub-model can be used as a target language model to complete the training process of the language model. By training the answer generation sub-model based on the quality evaluation score in the case that the language question is an open question, the rationality and effectiveness of the trained target language model for open question solution can be improved.
According to the technical scheme provided by the embodiment of the invention, the model output answer is obtained by inputting the preset unlabeled question sample into the answer generation sub-model in the initial language model; under the condition that the language question type is an open language question, scoring the quality of the model output answer based on the answer evaluation sub-model to obtain a quality evaluation score; taking the quality evaluation score as an answer evaluation result; adjusting parameters of the answer generation submodel according to the answer evaluation result; and when the loss function of the answer generation sub-model is converged, completing the model training process to obtain the target language model. The technical scheme of the embodiment of the invention solves the problem that the language model trained by the existing non-supervision language model training method is single in applicable scene, can ensure that the trained language model has good applicability in various language type problems, enriches the applicable scene of the language model and improves the universality of the language model.
Fig. 3 is a flowchart of another language model training method provided by the embodiment of the present invention, where the embodiment of the present invention is applicable to a scenario in which an unsupervised training is performed on a language model, and on the basis of the foregoing embodiment, how to evaluate a model output answer based on an answer evaluation sub-model in an initial language model to obtain an answer evaluation result is further described on the basis of the foregoing embodiment. The apparatus may be implemented in software and/or hardware, and integrated into a computer device having application development functionality.
As shown in fig. 3, the language model training method includes the steps of:
s310, inputting a preset unlabeled question sample into an answer generation sub-model in the initial language model to obtain a model output answer.
The preset unlabeled question sample may be a preset unlabeled language question sample. The language model is a language abstract mathematical model which is carried out according to language objective facts, and is a corresponding relation. In particular, the language model may include a generative model, an analytical model, and an recognizability model. The language model can be used for solving language problems such as word translation, word abstract and the like; analysis of the properties of the individual elements in the input set may also be performed to clarify the relationships between the elements; it is also possible to determine whether the input element is a number of disordered words or a qualified sentence in language based on some recognition rule.
While the initial language model may be an original language model that has not been pre-trained. Specifically, the initial language model includes an answer generation sub-model and an answer evaluation sub-model. The answer generation sub-model can be used for solving the input language questions and outputting answers corresponding to the language questions. The answer evaluation sub-model can be used for evaluating the answers output by the answer generation sub-model to obtain an evaluation result. Then, the answer generation sub-model can be trained in turn based on the evaluation result output by the answer evaluation sub-model, so that the answer generation sub-model can output a better answer, and the unsupervised training of the initial language model is realized. The model output answer may be an answer corresponding to a preset unlabeled question sample output by the answer generation sub-model.
S320, evaluating the accuracy of the model output answer based on the answer evaluation sub-model to obtain an accuracy evaluation result, and taking the accuracy evaluation result as the answer evaluation result.
Among them, the types of language questions are various, and by way of example, the types of language questions may include: open problems and closed problems. Wherein, the open questions can be language questions without fixed answers such as text translation, text abstract and the like; the closed questions may select questions, blank questions, etc. with fixed answers. Because the answer of the language question is fixed and unique when the language question is an open question, the accuracy of the model output answer can be evaluated based on the answer evaluation model to obtain an accuracy evaluation result, and the accuracy evaluation result is used as an answer evaluation result.
In an alternative embodiment, the model output answer and the corresponding language question may be used as a set of question-answer pairs; and evaluating the question-answer pair based on the answer evaluation sub-model to obtain an answer evaluation result. Illustratively, the answer evaluation result may be expressed by the following formula:
where R represents an answer evaluation result, (q, o) represents a set of question-answer pairs, Φ (M (prompt, q, o)) represents a result of evaluating the question-answer rationality, and when the language question is an open question, an accuracy evaluation result of the answer output from the model may be used as an answer evaluation result.
S330, adjusting parameters of the answer generation sub-model according to the answer evaluation result.
After the answer evaluation result is obtained, parameters of the answer generation sub-model can be adjusted according to the answer evaluation result, so that answers with higher rationality can be output, and training of the language model is achieved.
And S340, completing a model training process when the loss function of the answer generation sub-model is converged, and obtaining a target language model.
The target language model may be a language model obtained through training. Specifically, after the parameters of the answer generation model are adjusted according to the answer evaluation result, when the answer generation self-model loss function converges, the answer generation sub-model at this time may be used as the target language model. When the loss function of the answer generation sub-model converges, the error between the answer of the question output by the file generation sub-model and the standard answer is within a reasonable error range, and the answer generation sub-model can be used as a target language model to complete the training process of the language model. By training the answer generation sub-model based on the accuracy evaluation result under the condition that the language problem is the closed problem, the rationality and the effectiveness of the trained target language model on the closed problem answer can be improved.
Exemplary, fig. 4 is a flowchart of a method for training a language model according to an embodiment of the present invention, where an "initial language model" represents an initial language model, the "initial language model" includes an answer generation sub-model and an answer evaluation sub-model, and a "question-answer" represents an answer corresponding to no tag data. As shown in fig. 4, the method for training the language model includes: inputting a preset unlabeled data question into an initial language model, generating a corresponding question and answer by an answer generation sub-model, and then evaluating the question and answer by an answer evaluation sub-model based on an evaluation prompt to determine a corresponding reward; the evaluation prompt comprises a quality evaluation prompt when the type of the unlabeled data problem is an open problem and an accuracy evaluation prompt when the type of the unlabeled data problem is a closed problem; and then updating the answer generation sub-model according to the rewards so as to improve the performance of the answer generation sub-model and further complete the training process of language model training.
The technical scheme provided by the embodiment of the invention can enable the initial language model to perform self-evaluation, enable the initial language model to deeply understand own performance and find out the improved direction, effectively improve the performance of the initial language model in various text generation tasks, and have expandability potential under different model scales and training data.
According to the technical scheme provided by the embodiment of the invention, the model output answer is obtained by inputting the preset unlabeled question sample into the answer generation sub-model in the initial language model; under the condition that the language question type is a closed language question, evaluating the accuracy of the model output answer based on the answer evaluation sub-model to obtain an accuracy evaluation result; taking the accuracy evaluation result as the answer evaluation result; adjusting parameters of the answer generation sub-model according to the answer evaluation result; and when the loss function of the answer generation sub-model is converged, completing the model training process to obtain the target language model. The technical scheme of the embodiment of the invention solves the problem that the language model trained by the existing non-supervision language model training method is single in applicable scene, can ensure that the trained language model has good applicability in various language type problems, enriches the applicable scene of the language model and improves the universality of the language model.
Fig. 5 is a schematic structural diagram of a language model training device provided by the embodiment of the present invention, where the embodiment of the present invention is applicable to a scenario in which an unsupervised training is performed on a language model, and the device may be implemented by software and/or hardware, and integrated into a computer device with an application development function.
As shown in fig. 5, the language model training apparatus includes: an answer output module 410, an answer evaluation module 420, and a model update module 430.
The answer output module 410 is configured to input a preset unlabeled question sample into an answer generation sub-model in the initial language model to obtain a model output answer; the answer evaluation module 420 is configured to evaluate an answer output by the model based on an answer evaluation sub-model in the initial language model and the language question type of the preset unlabeled question sample, so as to obtain an answer evaluation result; and a model updating module 430, configured to update the answer generation sub-model based on the answer evaluation result, so as to obtain a target language model.
According to the technical scheme provided by the embodiment of the invention, the model output answer is obtained by inputting the preset unlabeled question sample into the answer generation sub-model in the initial language model; evaluating the model output answer based on an answer evaluation sub-model in the initial language model and the language question type of the preset unlabeled question sample to obtain an answer evaluation result; and updating the answer generation sub-model based on the answer evaluation result to obtain a target language model. The technical scheme of the embodiment of the invention solves the problem that the language model trained by the existing non-supervision language model training method is single in applicable scene, can ensure that the trained language model has good applicability in various language type problems, enriches the applicable scene of the language model and improves the universality of the language model.
In an alternative embodiment, the answer evaluation module 420 is specifically configured to: under the condition that the language question type is an open language question, scoring the quality of the answer output by the model based on the answer generation sub-model to obtain a quality assessment score; and taking the quality evaluation score as the answer evaluation result.
In an alternative embodiment, the answer evaluation module 420 may be further configured to: under the condition that the language question type is a closed language question, evaluating the accuracy of the model output answer based on the answer evaluation sub-model to obtain an accuracy evaluation result; and taking the accuracy evaluation result as the answer evaluation result.
In an alternative embodiment, the answer evaluation module 420 may be further configured to: taking the model output answer and the corresponding language questions as a group of question-answer pairs; and evaluating the question-answer pair based on the answer evaluation sub-model to obtain the answer evaluation result.
In an alternative embodiment, the model update module 430 is specifically configured to: adjusting parameters of the answer generation sub-model according to the answer evaluation result; and when the loss function of the answer generation sub-model is converged, completing a model training process to obtain the target language model.
In an alternative embodiment, the target language model is a model trained based on the Flan-T5-Large model.
In an alternative embodiment, the model update module 430 is further configured to: training the answer generation sub-model based on a near-end strategy optimization algorithm to obtain the target language model.
The language model training device provided by the embodiment of the invention can execute the language model training method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Fig. 6 is a schematic structural diagram of a computer device according to an embodiment of the present invention. FIG. 6 illustrates a block diagram of an exemplary computer device 12 suitable for use in implementing embodiments of the present invention. The computer device 12 shown in fig. 6 is merely an example and should not be construed as limiting the functionality and scope of use of embodiments of the present invention. Computer device 12 may be any terminal device with computing capabilities and may be configured in a language model training device.
As shown in FIG. 6, the computer device 12 is in the form of a general purpose computing device. Components of computer device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, a bus 18 that connects the various system components, including the system memory 28 and the processing units 16.
Bus 18 may be one or more of several types of bus structures including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, micro channel architecture (MAC) bus, enhanced ISA bus, video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Computer device 12 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 30 and/or cache memory 32. The computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from or write to non-removable, nonvolatile magnetic media (not shown in FIG. 6, commonly referred to as a "hard disk drive"). Although not shown in fig. 6, a magnetic disk drive for reading from and writing to a removable non-volatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable non-volatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In such cases, each drive may be coupled to bus 18 through one or more data medium interfaces. The system memory 28 may include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of the embodiments of the invention.
A program/utility 40 having a set (at least one) of program modules 42 may be stored in, for example, system memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 42 generally perform the functions and/or methods of the embodiments described herein.
The computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), one or more devices that enable a user to interact with the computer device 12, and/or any devices (e.g., network card, modem, etc.) that enable the computer device 12 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 22. Moreover, computer device 12 may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet, through network adapter 20. As shown, network adapter 20 communicates with other modules of computer device 12 via bus 18. It should be appreciated that although not shown in fig. 6, other hardware and/or software modules may be used in connection with computer device 12, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
The processing unit 16 executes various functional applications and data processing by running programs stored in the system memory 28, for example, implementing a language model training method provided by the present embodiment, the method including:
inputting a preset unlabeled question sample into an answer generation sub-model in the initial language model to obtain a model output answer;
evaluating the model output answer based on an answer evaluation sub-model in the initial language model and the language question type of the preset unlabeled question sample to obtain an answer evaluation result;
and updating the answer generation sub-model based on the answer evaluation result to obtain a target language model.
The present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a language model training method as provided by any embodiment of the present invention, comprising:
inputting a preset unlabeled question sample into an answer generation sub-model in the initial language model to obtain a model output answer;
evaluating the model output answer based on an answer evaluation sub-model in the initial language model and the language question type of the preset unlabeled question sample to obtain an answer evaluation result;
and updating the answer generation sub-model based on the answer evaluation result to obtain a target language model.
The computer storage media of embodiments of the invention may take the form of any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium may be, for example, but not limited to: an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations of the present invention may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
It will be appreciated by those of ordinary skill in the art that the modules or steps of the invention described above may be implemented in a general purpose computing device, they may be centralized on a single computing device, or distributed over a network of computing devices, or they may alternatively be implemented in program code executable by a computer device, such that they are stored in a memory device and executed by the computing device, or they may be separately fabricated as individual integrated circuit modules, or multiple modules or steps within them may be fabricated as a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims (10)

1. A method for training a language model, comprising:
inputting a preset unlabeled question sample into an answer generation sub-model in the initial language model to obtain a model output answer;
evaluating the model output answer based on an answer evaluation sub-model in the initial language model and the language question type of the preset unlabeled question sample to obtain an answer evaluation result;
and updating the answer generation sub-model based on the answer evaluation result to obtain a target language model.
2. The method according to claim 1, wherein the evaluating the model output answer based on the answer evaluation sub-model in the initial language model and the language question type of the preset unlabeled question sample to obtain an answer evaluation result includes:
under the condition that the language question type is an open language question, scoring the quality of the answer output by the model based on the answer evaluation sub-model to obtain a quality evaluation score;
and taking the quality evaluation score as the answer evaluation result.
3. The method according to claim 1, wherein the evaluating the model output answer based on the answer evaluation sub-model in the initial language model and the language question type of the preset unlabeled question sample to obtain an answer evaluation result includes:
under the condition that the language question type is a closed language question, evaluating the accuracy of the model output answer based on the answer evaluation sub-model to obtain an accuracy evaluation result;
and taking the accuracy evaluation result as the answer evaluation result.
4. A method according to claim 2 or 3, wherein evaluating the model output answer to obtain an answer evaluation result comprises:
taking the model output answer and the corresponding language questions as a group of question-answer pairs;
and evaluating the question-answer pair based on the answer evaluation sub-model to obtain the answer evaluation result.
5. The method of claim 1, wherein updating the answer generation sub-model based on the answer evaluation result to obtain a target language model comprises:
adjusting parameters of the answer generation sub-model according to the answer evaluation result;
and when the loss function of the answer generation sub-model is converged, completing a model training process to obtain the target language model.
6. The method of claim 1, wherein the target language model is a model trained based on a Flan-T5-Large model.
7. The method according to claim 1, wherein the method further comprises:
training the answer generation sub-model based on a near-end strategy optimization algorithm to obtain the target language model.
8. A language model training apparatus, comprising:
the answer output module is used for inputting a preset unlabeled question sample into an answer generation sub-model in the initial language model to obtain a model output answer;
the answer evaluation module is used for evaluating the answers output by the model based on an answer evaluation sub-model in the initial language model and the language question type of the preset unlabeled question sample to obtain an answer evaluation result;
and the model updating module is used for updating the answer generation sub-model based on the answer evaluation result to obtain a target language model.
9. A computer device, the computer device comprising:
one or more processors;
a memory for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the language model training method of any one of claims 1-7.
10. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the language model training method of any one of claims 1-7.
CN202310914142.8A 2023-07-25 2023-07-25 Language model training method, device, equipment and storage medium Pending CN116663679A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310914142.8A CN116663679A (en) 2023-07-25 2023-07-25 Language model training method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310914142.8A CN116663679A (en) 2023-07-25 2023-07-25 Language model training method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116663679A true CN116663679A (en) 2023-08-29

Family

ID=87717363

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310914142.8A Pending CN116663679A (en) 2023-07-25 2023-07-25 Language model training method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116663679A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117332247A (en) * 2023-12-01 2024-01-02 苏州大学 Big data transaction and quality assessment method and system using big language model as medium
CN117556920A (en) * 2023-10-23 2024-02-13 星环信息科技(上海)股份有限公司 Large model illusion treatment method, device, equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020000779A1 (en) * 2018-06-28 2020-01-02 平安科技(深圳)有限公司 Method and apparatus for obtaining quality evaluation model, and computer device and storage medium
US20200019642A1 (en) * 2018-07-12 2020-01-16 International Business Machines Corporation Question Answering Using Trained Generative Adversarial Network Based Modeling of Text
CN111079938A (en) * 2019-11-28 2020-04-28 百度在线网络技术(北京)有限公司 Question-answer reading understanding model obtaining method and device, electronic equipment and storage medium
CN113553837A (en) * 2020-04-23 2021-10-26 北京金山数字娱乐科技有限公司 Reading understanding model training method and device and text analysis method and device
CN113836895A (en) * 2021-02-08 2021-12-24 宏龙科技(杭州)有限公司 Unsupervised machine reading understanding method based on large-scale problem self-learning
CN114461802A (en) * 2022-02-09 2022-05-10 湘潭大学 Self-training method of machine reading understanding model for question refusing to answer
CN115238101A (en) * 2022-09-23 2022-10-25 中国电子科技集团公司第十研究所 Multi-engine intelligent question-answering system oriented to multi-type knowledge base
CN115329054A (en) * 2022-06-01 2022-11-11 赵慧雅 Open domain question-answering system for complexity problem

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020000779A1 (en) * 2018-06-28 2020-01-02 平安科技(深圳)有限公司 Method and apparatus for obtaining quality evaluation model, and computer device and storage medium
US20200019642A1 (en) * 2018-07-12 2020-01-16 International Business Machines Corporation Question Answering Using Trained Generative Adversarial Network Based Modeling of Text
CN111079938A (en) * 2019-11-28 2020-04-28 百度在线网络技术(北京)有限公司 Question-answer reading understanding model obtaining method and device, electronic equipment and storage medium
CN113553837A (en) * 2020-04-23 2021-10-26 北京金山数字娱乐科技有限公司 Reading understanding model training method and device and text analysis method and device
CN113836895A (en) * 2021-02-08 2021-12-24 宏龙科技(杭州)有限公司 Unsupervised machine reading understanding method based on large-scale problem self-learning
CN114461802A (en) * 2022-02-09 2022-05-10 湘潭大学 Self-training method of machine reading understanding model for question refusing to answer
CN115329054A (en) * 2022-06-01 2022-11-11 赵慧雅 Open domain question-answering system for complexity problem
CN115238101A (en) * 2022-09-23 2022-10-25 中国电子科技集团公司第十研究所 Multi-engine intelligent question-answering system oriented to multi-type knowledge base

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HYUNG WON CHUNG等: "Scaling Instruction-Finetuned Language Models", 《ARXIV:2210.11416V5》, pages 1 - 54 *
同济大学桥梁健康监测与振动控制研究室: "自然语言处理前沿——大语言模型的前世今生", pages 1 - 14, Retrieved from the Internet <URL:《https://shmc.tongji.edu.cn/8c/8e/c32042a298126/page.htm》> *
汀丶人工智能: "人工智能LLM模型:奖励模型的训练、PPO 强化学习的训练、RLHF", pages 5 - 7, Retrieved from the Internet <URL:《https://juejin.cn/post/7256713304839192631》> *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117556920A (en) * 2023-10-23 2024-02-13 星环信息科技(上海)股份有限公司 Large model illusion treatment method, device, equipment and storage medium
CN117556920B (en) * 2023-10-23 2024-05-31 星环信息科技(上海)股份有限公司 Large model illusion treatment method, device, equipment and storage medium
CN117332247A (en) * 2023-12-01 2024-01-02 苏州大学 Big data transaction and quality assessment method and system using big language model as medium
CN117332247B (en) * 2023-12-01 2024-02-23 苏州大学 Big data transaction and quality assessment method and system using big language model as medium

Similar Documents

Publication Publication Date Title
Schodde et al. Adaptive robot language tutoring based on Bayesian knowledge tracing and predictive decision-making
Weston Dialog-based language learning
CN116663679A (en) Language model training method, device, equipment and storage medium
CN109389870B (en) Data self-adaptive adjusting method and device applied to electronic teaching
CN111753076B (en) Dialogue method, dialogue device, electronic equipment and readable storage medium
CN110991195B (en) Machine translation model training method, device and storage medium
Massaro Multimodal learning
CN112740132A (en) Scoring prediction for short answer questions
CN111782787B (en) Problem generation model training method and problem generation method
US20140295400A1 (en) Systems and Methods for Assessing Conversation Aptitude
Fang et al. Artificial intelligence-based assessment in education
Paladines et al. An Intelligent Tutoring System for Procedural Training with Natural Language Interaction.
US20230108579A1 (en) Dynamic entity representations for sequence generation
Wang Learning teaching in teaching: Online reinforcement learning for intelligent tutoring
Nodira et al. Teaching Currently Using Interactive Methods in Problem" Probability Theory and Mathematical Statistics"
CN111723185B (en) Question generation method
KR20140051607A (en) Apparatus providing analysis information based on level of a student and method thereof
Ruan Special‐Purpose English Teaching Reform and Model Design in the Era of Artificial Intelligence
CN111783434A (en) Method and system for improving anti-noise capability of reply generation model
Khasianov et al. Three agent platform approach for digital education environment
US10453354B2 (en) Automatically generated flash cards
CN112199476A (en) Automated decision making to select a leg after partial correct answers in a conversational intelligence tutor system
CN112528221A (en) Knowledge and capability binary tracking method based on continuous matrix decomposition
CN117993366B (en) Evaluation item dynamic generation method and system, electronic equipment and readable storage medium
CN112948650B (en) Learning effect display method and device and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination