CN117744661A

CN117744661A - Text generation model training method and text generation method based on prompt word engineering

Info

Publication number: CN117744661A
Application number: CN202410193318.XA
Authority: CN
Inventors: 张轩铭; 王伟萌; 朱韦桥; 刘承亮; 张向阳; 樊春雷; 惠伟; 马龙; 刘帅龙; 孙晶; 麻磊; 李健; 蒲照欣; 王喆; 解辰辉; 蔡宇晶; 刘辰
Original assignee: China Academy of Railway Sciences Corp Ltd CARS; Institute of Computing Technologies of CARS; Beijing Jingwei Information Technology Co Ltd
Current assignee: China Academy of Railway Sciences Corp Ltd CARS; Institute of Computing Technologies of CARS; Beijing Jingwei Information Technology Co Ltd
Priority date: 2024-02-21
Filing date: 2024-02-21
Publication date: 2024-03-22
Anticipated expiration: 2044-02-21
Also published as: CN117744661B

Abstract

The embodiment of the application discloses a text generation model training method and a text generation method based on prompt word engineering, which relate to the technical field of large language models and comprise the following steps: acquiring a text data set; determining a document category of each text data in the text data set; labeling each type of text data according to the corresponding labeling dimension; constructing a first model comprising a plurality of language models and a specific task layer; training the first model by using the marked text data; and evaluating the output result of each language model, and selecting the language model corresponding to each type of text data to obtain a trained text generation model. By selecting data with obvious various document characteristics and marking the characteristics of specific content characteristics, type marks, covering elements and the like capable of distinguishing the document types according to the document types, the model can learn specific language styles, vocabularies, sentence patterns and templates in the field of electronic documents so as to output precise and strict document texts.

Description

Text generation model training method and text generation method based on prompt word engineering

Technical Field

The embodiment of the application relates to the technical field of large language models, in particular to a text generation model training method and a text generation method based on prompt word engineering.

Background

The large language model can process a large amount of natural language text and learn knowledge and language rules from the large language model, so that understanding and generating capacity of natural language are improved. The large language model may be used for various natural language processing tasks such as text generation, reading understanding, common sense reasoning, etc.

The prior art is generally combined with a large language model and prompt word engineering to realize a text generation model oriented to the document field so as to generate different types of document texts. However, although the direction and style of text generation can be controlled to some extent using the term of the prompter, flexibility is still limited. The text generation model may tend to generate replies with stronger versatility, but is more difficult to accurately meet the requirements of specific scenes or requirements, has weaker professionals for the field of electronic documents, and has a style of text generation which is more different from the line style of the documents.

In addition, the text generation model may lack a consistent language style or infer non-consistent answers when generating text. It may occur that the speech, vocabulary or sentence patterns are switched in the same dialogue, or two different results are given when the same problem is calculated, which eventually results in inconsistent style of the reply or incorrect results.

Disclosure of Invention

The embodiment of the application provides a text generation model training method and a text generation method based on prompt word engineering, which can solve the problem that the output of the existing text generation model is not accurate enough.

In a first aspect, an embodiment of the present application provides a text generation model training method based on a prompt word engineering, where the method includes:

acquiring a text data set, wherein the text data set is a set of document data with a specific format;

determining a document category of each text data in the text data set;

labeling each type of text data according to the corresponding labeling dimension;

constructing a first model, and training the first model by using marked text data, wherein the first model comprises a plurality of language models and a specific task layer, and the specific task layer is used for converting the marked text data into data which can be identified by the plurality of language models;

and evaluating the output result of each language model, and selecting the language model corresponding to each type of text data according to the evaluation result to obtain a trained text generation model.

In an alternative design, the labeling, for each type of text data, according to the corresponding labeling dimension includes:

according to the expression characteristics of each kind of document, marking a prompt word and a specific element to obtain a prompt phrase and text data marked with the specific element, wherein the specific element is a preset phrase or sentence capable of representing the document type, and the prompt word comprises: genre, topic, primary content.

In an alternative design, the method further comprises: and adding a specific task mark in the prompt phrase so that a specific task layer identifies the task type through the specific task mark.

In an alternative design, the training the first model using the annotated text data includes:

and inputting the text data marked with the specific elements and the prompt phrase into a first model for training.

In an alternative design, the method further comprises:

and the specific task layer extracts key information from the prompt phrase so that the text generation model generates text according to the key information.

In an alternative design, the method further comprises:

introducing a weighted term to adjust the attention degree of the prompt word;

the first model is adjusted and optimized based on the output.

In a second aspect, an embodiment of the present application provides a text generation method based on a prompt word engineering, including: a text generation model obtained by a text generation model training method based on the prompt word engineering according to any one of claims 1-6 is obtained, the prompt word is obtained, and the prompt word is input into the text generation model to generate text.

In a third aspect, embodiments of the present application provide an apparatus, including:

the receiving module is used for acquiring a text data set, wherein the text data set is a set of document data with a specific format;

the processing module is used for determining the document type of each text data in the text data set; labeling each type of text data according to the corresponding labeling dimension; constructing a first model, and training the first model by using marked text data, wherein the first model comprises a plurality of language models and a specific task layer, and the specific task layer is used for converting the marked text data into data which can be identified by the plurality of language models; and evaluating the output result of each language model, and selecting the language model corresponding to each type of text data according to the evaluation result to obtain a trained text generation model.

In a fourth aspect, embodiments of the present application provide an electronic device including a memory and one or more processors; wherein the memory is for storing computer program code, the computer program code comprising computer instructions; the computer instructions, when executed by the processor, cause the electronic device to perform part or all of the steps of the method of the first aspect or in various possible implementations of the first aspect.

In a fifth aspect, embodiments of the present application provide a computer storage medium having instructions stored therein, which when executed on a computer, cause the computer to perform some or all of the steps of the method of the first aspect or in various possible implementations of the first aspect.

The embodiment of the application provides a text generation model training method based on prompt word engineering, which comprises the following steps: acquiring a text data set, wherein the text data set is a set of document data with a specific format; determining a document category of each text data in the text data set; labeling each type of text data according to the corresponding labeling dimension; constructing a first model, wherein the first model comprises a plurality of language models and a specific task layer, and the specific task layer is used for processing marked text data; training the first model by using the marked text data; and evaluating the output result of each language model, and selecting the language model corresponding to each type of text data according to the evaluation result to obtain a trained text generation model. By selecting data with obvious various document characteristics and marking the characteristics of specific content characteristics, type marks, covering elements and the like capable of distinguishing the document types according to the document types, the model can learn specific language styles, vocabularies, sentence patterns and templates in the field of electronic documents so as to output precise and strict document texts.

Drawings

For a clearer description of the technical solutions of the present application, the drawings that are required to be used in the embodiments will be briefly described, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without the inventive effort.

FIG. 1 is a flowchart of a text generation model training method based on a prompt word project according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a text generation model training device 200 based on a prompt word engineering according to an embodiment of the present application;

fig. 3 is an exemplary structural schematic diagram of an electronic device 300 according to an embodiment of the present application.

Detailed Description

The technical solutions of the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.

The terminology used in the following embodiments of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification and the appended claims, the singular forms "a," "an," "the," and "the" are intended to include the plural forms as well, unless the context clearly indicates to the contrary. It should also be understood that, although the terms first, second, etc. may be used in the following embodiments to describe certain types of objects, the objects should not be limited to these terms. These terms are only used to distinguish between specific objects of that class of objects.

The large language model is a deep learning model based on a transducer, and can process a large amount of natural language texts and learn knowledge and language rules from the texts, so that understanding and generating capacity of natural language are improved. The large model with different language styles can be output according to different scenes by using the prompt word engineering technology. Although the prior art can control the style of the generated text to a certain extent, the flexibility of the output is still limited, the reply of the output is often high in universality, and the output is inconsistent due to different front and back styles.

Based on the above, the application provides a method for solving the problems of limited flexibility and lack of consistency of model output results through word-of-hint engineering. The method can collect related text data including electronic documents, official documents, administrative documents, news stories and the like in a data collection link. These data will be used as input data to a training model to learn specific language styles, vocabulary and sentence patterns in the electronic document field. And fine tuning is performed through a prompt word engineering technology, and the output result is guided through optimization of the rewarding model, so that the behavior of the model in the process of generating the electronic document is further guided. In the fine tuning stage, specific prompt words or prompt texts are designed, and the guide model conforms to the format, structure and specification of the electronic document. The output of the model is controlled to be more deterministic and consistent by adjusting parameters, the inconsistent situation of the result is avoided, and the rigor of the document language is ensured

The text generation model training method based on the prompt word engineering according to the embodiment of the application is described in the following through several embodiments.

As shown in fig. 1, fig. 1 illustrates a text generation model training method 100 (hereinafter referred to as method 100) based on a prompt word engineering, and the method 100 includes the following steps:

step S101, a text data set is acquired, wherein the text data is a set of document data with a specific format.

In the embodiment, electronic document data are collected from an electronic document system and a public national policy document library, firstly, an unstructured document is extracted from a text through an OCR technology to form structured data in a text form, and the structured data are arranged and cleaned to ensure the accuracy and the integrity of the data. After data is collected, data preprocessing is performed, including denoising operations such as word segmentation, stop word removal, part-of-speech tagging and the like, so as to provide support for subsequent model training and generation.

Step S102, determining the document type of each text data in the text data set.

In this embodiment, the document categories include: the documents of each type have their own specific formats, fixed phrases, templates, and other identifying features, and therefore, it is necessary to classify the text data.

Step S103, labeling is carried out according to the corresponding labeling dimension aiming at each type of text data.

In this embodiment, for each type of document feature, a corresponding suitable labeling dimension needs to be selected, where the labeling dimension depends on the content features, type marks, coverage elements, and other elements capable of distinguishing the document types of each document.

Step S104, a first model is built, the first model is trained by using the marked text data, the first model comprises a plurality of language models and a specific task layer, and the specific task layer is used for converting the marked text data into data which can be identified by the plurality of language models.

In this embodiment, existing language models are used as the basis, such as recurrent neural networks, long-short term memory networks, GPT, BERT, LSTM-CVAE, and the like. And adds a specific task layer therein for processing the entered prompt words to the model.

Step S105, the output result of each language model is evaluated, and the language model corresponding to each type of text data is selected according to the evaluation result, so that a trained text generation model is obtained.

In this embodiment, for text data of any document type, training is performed by using multiple language models, and performance of different language models is verified, a model with the best output effect is selected as a processing model of the document type, and the above operations are repeatedly performed until the language model corresponding to each document type is determined, so as to obtain a trained text generation model.

In an alternative embodiment, the labeling, for each type of text data, according to the corresponding labeling dimension includes:

In this embodiment, corresponding elements are labeled according to the document type, for example: the content mainly related to the document of the notification class has important matters, notification or arrangement of related matters, and the document of the notification class is generally reflected at a title, so that the document of the notification class can be marked with aspects such as notification reasons, notification matters, notification requirements and the like; the documents of the notification class can be marked with the comments, the criticizing, the important questions and the like; the documents of the report class can mark the issuing authorities, the event, the report and the like; the documents of the request class can mark the request reasons, the request matters, the tendency opinions and the like; the documents of the wholesale class can mark the wholesale quotation, the wholesale opinion, the wholesale requirement and the like; the documents of the letters can mark phrases such as the inquiry, the request, the letter, the notice, the letter and the like; the documents of the plan class may mark plans, how to do, etc. Of course, the labeling manner in the specific implementation may be adjusted according to the actual situation, and the labeling manner described above is merely an example and is not limited to the embodiments of the present application.

In an alternative embodiment, the method further comprises: and adding a specific task mark in the prompt phrase so that a specific task layer identifies the task type through the specific task mark.

In this embodiment, a specific task identifier, for example, "[ Notice ]", is added at the beginning or end of the input sequence to tell the model what the text type currently needs to be generated is, so that the specific task layer identifies the task type by the specific task identifier. Of course, specific task identifications, including but not limited to text types, may include other relevant input requirements, which are not limited in this application.

In an alternative embodiment, the training the first model using the annotated text data includes:

In an alternative embodiment, the method further comprises:

In this embodiment, the task-specific layer is able to process the prompt, i.e. the input sequence to the model. The specific task layer recognizes and extracts the type, main content, theme and the like of the text to be generated through recognition of the prompt words, and then transfers the text to the model so that the model generates the text meeting the requirements. The specific task layer converts the prompt words into types which can be identified and processed by multiple language models, namely specific prompt words, so that the multiple language models generate texts meeting the requirements.

In an alternative embodiment, the method further comprises:

introducing a weighted term to adjust the attention degree of the prompt word;

the first model is adjusted and optimized based on the output.

In this embodiment, in the pre-training model, some general language representations are learned by means of unsupervised learning, and in the task-specific fine tuning process, the model is adjusted and optimized according to the requirements of a specific task, for example, for different types of documents, elements representing each type of document are learned differently. In this process, the model is guided to focus on the cue words by a loss function. The importance of the prompt word is reflected on the weight in the loss function, so that the model pays more attention to and pays more attention to the prompt word. For example, a weighting term is introduced to adjust the degree of attention of the model to the cue words.

In summary, according to the text generation model training method based on the prompt word engineering, through selecting data with obvious various document characteristics and marking the characteristics of specific content characteristics, type marks, covering elements and the like capable of distinguishing the document types according to the document types, the model can learn specific language styles, vocabularies, sentence patterns and templates in the electronic document field so as to output accurate and strict document texts.

In addition, the embodiment of the application also provides a text generation method based on the prompt word engineering, which comprises the following steps: the text generation model obtained by the method shown in fig. 1 is obtained, the prompt word is obtained, and the prompt word is input into the text generation model to generate text.

In the embodiment, in a text generation stage, the type, the theme and the main content of a text to be generated are input by a user, and a text generation model identifies a prompt word input by the user and generates a document text of a corresponding type.

Corresponding to the method shown in fig. 1, the embodiment of the application also provides a device for executing the method.

As shown in fig. 2, fig. 2 illustrates a text generation model training apparatus 200 based on a prompt word project, the apparatus comprising:

a receiving module 201, configured to obtain a text data set, where the text data set is a set of document data with a specific format;

a processing module 202, configured to determine a document category of each text data in the text data set; labeling each type of text data according to the corresponding labeling dimension; constructing a first model, and training the first model by using marked text data, wherein the first model comprises a plurality of language models and a specific task layer, and the specific task layer is used for converting the marked text data into data which can be identified by the plurality of language models; and evaluating the output result of each language model, and selecting the language model corresponding to each type of text data according to the evaluation result to obtain a trained text generation model.

It will be understood that the above division of each module/unit is merely a division of a logic function, and when actually implemented, the functions of each module may be integrated into a hardware entity, for example, the functions of the processing module may be integrated into a processor implementation, the functions of the receiving module may be integrated into a transceiver implementation, and a program and an instruction for implementing the functions of each module may be maintained in a memory. For example, fig. B provides an electronic device 300, the electronic device 300 comprising a processor 301, a transceiver 302, and a memory 303. Wherein the transceiver 302 is configured to perform the transceiving of data and signals in the method 100. The memory 303 may be used to store programs/code and the like required by the processor 301 to perform the method 100.

In a specific implementation, corresponding to the foregoing electronic device 300, the embodiments of the present application further provide a computer storage medium, where the computer storage medium provided in the electronic device 300 may store a program, and when the program is executed, may implement some or all of the steps including the embodiments of the methods 100 to 300. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a random-access memory (random access memory, RAM), or the like.

Those skilled in the art will appreciate that, for convenience and brevity, the specific working procedures of the above-described systems, apparatuses and units may refer to the corresponding procedures in the foregoing method embodiments, which are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed methods, apparatuses, and systems may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in whole or in part in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a control device for a cloud game, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

While alternative embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.

The foregoing embodiments have been provided for the purpose of illustrating the general principles of the present invention, and are not meant to limit the scope of the invention, but to limit the scope of the invention.

Claims

1. A text generation model training method based on prompt word engineering, the method comprising:

determining a document category of each text data in the text data set;

2. The method of claim 1, wherein labeling, for each type of text data, according to a corresponding labeling dimension, comprises:

3. The method as recited in claim 2, further comprising: and adding a specific task mark in the prompt phrase so that a specific task layer identifies the task type through the specific task mark.

4. The method of claim 3, wherein training the first model using the annotated text data comprises:

5. The method as recited in claim 4, further comprising:

6. The method as recited in claim 4, further comprising:

introducing a weighted term to adjust the attention degree of the prompt word;

the first model is adjusted and optimized based on the output.

7. A text generation method based on prompt word engineering is characterized in that: a text generation model obtained by a text generation model training method based on the prompt word engineering according to any one of claims 1-6 is obtained, the prompt word is obtained, and the prompt word is input into the text generation model to generate text.

8. A text generation model training device based on prompt word engineering, the device comprising:

9. An electronic device comprising a memory and one or more processors; wherein the memory is for storing computer program code, the computer program code comprising computer instructions; the computer instructions, when executed by the processor, cause the electronic device to perform the method of any one of claims 1 to 6.

10. A computer readable storage medium comprising a computer program which, when run on a computer, causes the computer to perform the method of any one of claims 1 to 6.