CN117787241A

CN117787241A - Method and device for controlling length of generated text based on large language model

Info

Publication number: CN117787241A
Application number: CN202311824583.5A
Authority: CN
Inventors: 杨二光; 闫洲; 崔向阳; 王鑫; 杨松
Original assignee: People Co Ltd
Current assignee: Konami Sports Club Co Ltd
Priority date: 2023-12-27
Filing date: 2023-12-27
Publication date: 2024-03-29

Abstract

The embodiment of the application discloses a method and a device for controlling the length of a generated text based on a large language model, wherein the method comprises the following steps: acquiring marked first model alignment data; the first model alignment data comprises a plurality of generation instructions and target texts corresponding to the generation instructions; constructing second model alignment data according to each generation instruction in the first model alignment data and a target text corresponding to each generation instruction; the second model alignment data comprises a plurality of sample instructions containing length control instructions and target texts corresponding to the plurality of sample instructions; and training the large language model based on the probability ordering mode by using the second model alignment data to obtain a target text generation model. According to the method and the device, the length controllability of the trained target text generation model on the generated text is effectively improved in a probability ordering mode, and the consumption of calculation resources in the training stage is remarkably reduced.

Description

Method and device for controlling length of generated text based on large language model

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a method and a device for controlling the length of a generated text based on a large language model.

Background

In recent years, as models and training data are continuously expanded in size, large language models (Large Language Model, LLM) such as ChatGPT, GPT-4 have demonstrated excellent capabilities in various fields, and LLM-driven intelligent content generation has received a great deal of attention as an important application direction of LLM. The LLM-based content automatic generation technology mainly utilizes instructions in a natural language form to drive the LLM to generate corresponding texts as replies, and can generate coherent and semantic-rich texts by utilizing the LLM, so that important roles are played in tasks such as a dialogue system, content creation and the like. In practical applications, the length of the generated text is an important consideration, and the user usually has an expected generated text length, and when the LLM is used to generate a theme article, abstract, or conduct a knowledge question-answering, and a dialogue, the LLM is required to control the length of the generated text so as to meet the requirements of the user or the application scene.

In the prior art, in order to realize the length control of the generated text, a length control instruction is usually added to an input instruction, for example, a word number is not less than 800 words, and the like, however, a model still often generates text which does not conform to the length control instruction under the constraint of the length control instruction. To address this problem, jie et al fine-tunes LLM to follow length control instructions using a near-end policy optimization (Proximal Policy Optimization, PPO) method. However, the PPO method is sensitive to super parameter setting, the training process is unstable, and the LLM is easy to generate degradation phenomenon; in addition, based on the reinforcement learning mode, 4 online models are required to be simultaneously operated in the training stage, and more computing resources are required to be consumed.

Disclosure of Invention

The present invention has been made in view of the above problems, and it is an object of the present invention to provide a large language model based generated text length control method, apparatus, computing device and storage medium that overcomes or at least partially solves the above problems.

According to an aspect of the embodiment of the present application, there is provided a method for controlling a length of a generated text based on a large language model, including:

acquiring marked first model alignment data; the first model alignment data comprises a plurality of generation instructions and target texts corresponding to the generation instructions;

constructing second model alignment data according to each generation instruction in the first model alignment data and a target text corresponding to each generation instruction; the second model alignment data comprises a plurality of sample instructions containing length control instructions and target texts corresponding to the plurality of sample instructions;

and training the large language model based on the probability ordering mode by using the second model alignment data to obtain a target text generation model.

According to another aspect of the embodiments of the present application, there is provided a generated text length control apparatus based on a large language model, including:

the acquisition module is suitable for acquiring the marked first model alignment data; the first model alignment data comprises a plurality of generation instructions and target texts corresponding to the generation instructions;

the construction module is suitable for constructing second model alignment data according to each generation instruction in the first model alignment data and the target text corresponding to each generation instruction; the second model alignment data comprises a plurality of sample instructions containing length control instructions and target texts corresponding to the plurality of sample instructions;

and the training module is suitable for training the large language model based on a probability ordering mode by using the second model alignment data to obtain a target text generation model.

According to yet another aspect of embodiments of the present application, there is provided a computing device comprising: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface are communicated with each other through the communication bus;

the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the large language model-based text length control method.

According to still another aspect of the embodiments of the present application, there is provided a computer storage medium having at least one executable instruction stored therein, the executable instruction causing a processor to perform operations corresponding to the method for generating text length control based on a large language model as described above.

According to the method, the device, the computing equipment and the storage medium for controlling the length of the generated text based on the large language model, through statistics on the text length of the target text corresponding to each generated instruction in the marked first model alignment data, the corresponding sample instruction containing the length control instruction can be conveniently constructed for each generated instruction, the automatic construction of the second model alignment data with the length control instruction on a large scale is conveniently and efficiently realized, and sufficient sample data support is provided for training the generation of the target text generation model with controllable text length; in addition, a large language model gives higher probability for the generated text conforming to the length control instruction and lower probability for the text not conforming to the length control instruction in a probability ordering mode, so that the length controllability of the trained target text generation model on the generated text is effectively improved; compared with the prior art that 4 online models are required to be simultaneously operated in a training stage based on a reinforcement learning mode, the method has the advantages that only 1 online model is required to be operated in the training stage, and the consumption of computing resources is obviously reduced; compared with reinforcement learning, the method and the device can also improve the training speed of the model, thereby accelerating the iteration speed of the product; in addition, the target text generation model obtained by the scheme can well follow the length control instruction to generate a coherent and semantic-rich text, can meet the requirements of diversified user requirements and business scenes, has strong universality, and can be well applied to different generation tasks and business scenes, such as automatic abstract, text generation and the like.

The foregoing description is merely an overview of the technical solutions of the embodiments of the present application, and may be implemented according to the content of the specification, so that the technical means of the embodiments of the present application can be more clearly understood, and the following specific implementation of the embodiments of the present application will be more clearly understood.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the examples of the application. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:

FIG. 1 illustrates a flow diagram of a large language model based method of generating text length control according to one embodiment of the present application;

FIG. 2 illustrates a schematic diagram of a method for generating text length control based on a large language model according to one embodiment of the present application;

FIG. 3 shows a block diagram of a large language model based generated text length control apparatus according to one embodiment of the present application;

FIG. 4 illustrates a structural schematic diagram of a computing device according to one embodiment of the present application.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

FIG. 1 shows a flow diagram of a large language model based method for generating text length control, as shown in FIG. 1, according to one embodiment of the present application, the method comprising the steps of:

step S101, obtaining marked first model alignment data.

The first model alignment data comprises a plurality of generation instructions and target texts corresponding to the generation instructions.

Step S102, constructing second model alignment data according to each generation instruction in the first model alignment data and the target text corresponding to each generation instruction.

The second model alignment data comprises a plurality of sample instructions containing length control instructions and target texts corresponding to the plurality of sample instructions.

And step S103, training the large language model based on a probability ordering mode by using the second model alignment data to obtain a target text generation model.

The embodiment of the application provides a method for controlling the length of a generated text based on a large language model, which can enable the large language model (namely LLM) to give higher probability to the generated text conforming to the length control instruction and lower probability to the text not conforming to the length control instruction in a probability ordering mode, so that the length controllability of the generated text of a trained target text generation model is improved. Compared with the prior art that 4 online models are required to be run simultaneously in a training stage based on a reinforcement learning mode, the embodiment of the application only needs to run 1 online model in the training stage, and the consumption of computing resources is obviously reduced. The scheme is specifically described in the following from task definition, data preparation, model structure and training method.

(1) Task definition:

the length-controllable text generation formalization is defined as follows: the user given an instruction input x, x may include a user intent instruction x _q Length control instruction x _l Two parts, using instruction x to drive LLM to generate text (i.e. reply)Text requiring generation->Semantics and x _q Correlated and at the same time conform to x _l Length control requirements of (c) are set.

(2) Data preparation:

this data preparation phase corresponds to step S101 and step S102 in fig. 1.

In the embodiment of the application, in order to train to obtain the target text generation model with controllable generated text length, the labeled model alignment data needs to be obtained, and then the model alignment data is processed to construct the model alignment data with the length control instruction. For convenience of distinction, the obtained labeled model alignment data is referred to as first model alignment data, and the processed model alignment data having a length control instruction is referred to as second model alignment data.

The large language model can mainly utilize the inclusion of large scale labels<Generating instructions, target text>First model alignment data of the form data, the first model alignment data being representable asTraining a large language model in an instruction fine tuning manner so that the model can execute various tasks according to natural language instructions, wherein q is as follows _k Representing a kth generation instruction in the first model alignment data; y is _k Representing the target text corresponding to the kth generation instruction in the first model alignment data, i.e. following q _k Is a real text; k represents an index of the set of data in the first model alignment data.

Aiming at each generation instruction in the first model alignment data, constructing a corresponding sample instruction containing a length control instruction for the generation instruction, and taking a target text corresponding to the generation instruction as a target text corresponding to the sample instruction; and then summarizing the plurality of sample instructions and target texts corresponding to the plurality of sample instructions to form second model alignment data.

Specifically, generating an instruction aiming at each of the first model alignment data, and counting the text length of a target text corresponding to the generated instruction; generating a length control instruction containing a length control threshold according to the text length; and then splicing the length control instruction and the generation instruction to construct a sample instruction corresponding to the generation instruction.

For text generation with controllable length, different business scenes have different requirements on text length, for example, a certain social platform requires that the text length cannot exceed 140 words, and another social platform requires that the text word number cannot exceed 1000 words, and generally 500 to 700 words have good effect. In the embodiment of the application, in order to meet the requirements of different service scenarios, [ L ] can be used _min ，L _max ]Formally representing the length control requirement, wherein L _min And L _max For controlling the threshold value for length, in particular L _min For the length control lower threshold, L _max An upper limit threshold is controlled for the length.

Table 1 shows various length control representations and corresponding length control instructions, as shown in Table 1, text length can be divided into three types of short, medium and long, L for short types _min Is 0, L _max 100, the corresponding length control instruction generates no more than 100 characters for the text; for medium type, L _min Is 101, L _max 200, corresponding length control instructions generate text with a length between 101 and 200 characters; for long types, L _min Is 201, L _max Is +' infinity of the two points, the corresponding length control instruction generates a length of not less than 201 characters for the text.

Type(s)	L _min	L _max	Length control instruction
				Short length	0	100	Text generation no longer than 100 characters
In (a)	101	200	The text generation length is between 101 and 200 characters
				Long length	201	+∞	Text generation length is not less than 201 characters

Table 1 shows various length control representations and corresponding length control instructions

By adopting the expression mode, only different L's are required to be arranged _min And L _max Can be easily extended to different business scenarios. Aligning the generation instruction q in the data for the first model _k By counting the generation instruction q _k Corresponding target text y _k The text length of (2) can be determined which length interval it is located in, i.e. the length control threshold is determined, and the length control instruction containing the length control threshold is obtained, and the instruction q is generated _k Based on the splicing length control instruction, thereby obtaining the instruction x required by the text generation task with controllable length _k The instruction x _k Called sample instruction, and generates instruction q _k Corresponding target text y _k As sample instruction x _k Corresponding target text y _k . And processing each generation instruction in the first model alignment data according to the mode to obtain a sample instruction corresponding to each generation instruction and a target text corresponding to each sample instruction. The second model alignment data is formed by summarizing a plurality of sample instructions and target texts corresponding to the plurality of sample instructions, so that convenience is broughtAutomatic construction of the second model alignment data with the length control instructions on a large scale is realized. Wherein the second model alignment data may be represented asx _k A sample instruction representing a kth one of the second model alignment data that includes a length control instruction; y is _k Representing the target text corresponding to the kth sample instruction in the second model alignment data, i.e. following x _k Is a real text; k represents the index of the sample in the second model alignment data. And training the LLM by using a sample instruction in the second model alignment data and a corresponding target text to obtain a target text generation model with controllable generated text length.

(3) Model structure:

the embodiment of the application can be realized based on any current open source large language model, such as ChatGLM, baiChuan, and the large models all adopt a model structure of GPT, which is formed by stacking a plurality of layers of transformers, each layer uses a mask self-attention mechanism, when predicting words at each position, GPT can only use the above information of the current position, so that the model can predict text sequences according to an autoregressive mode.

(4) The training method comprises the following steps:

this training phase corresponds to step S103 in fig. 1. Aiming at model training, the embodiment of the application provides a two-stage training method.

FIG. 2 is a schematic diagram of a large language model based text length control method of generation, according to one embodiment of the present application, as shown in FIG. 2, in a first stage, aligning data using a second modelThe instruction fine tuning (Instruction Tuning) mode is adopted to conduct large language model pre-training, so that the large language model is helped to achieve initial length control capability, an intermediate large language model is obtained, and the intermediate large language model can be expressed as LLM _sft 。

The instruction fine tuning mode is an optimization method aiming at LLMIs a key technology for improving LLM capability and controllability, which aligns data in a second modelThis dataset further trains the LLM in a supervised manner, by adjusting the parameters of the model to better perform specific tasks. The main idea of instruction fine tuning is to provide a set of targeted instructions for a model according to task requirements, so that the model can follow the instructions in the training process, and the performance of the model on specific tasks is improved.

In the second stage, the second model alignment data is utilized to carry out secondary training on the middle large language model based on a probability ordering mode, and a target text generation model is obtained. The core is that LLM is allowed to _sft The method has the advantages that higher probability is given to the generated text conforming to the length control instruction, and lower probability is given to the text not conforming to the length control instruction, so that the length controllability of the text generated by the large language model is improved.

Specifically, inputting any sample instruction in the second model alignment data into a middle large language model, generating a plurality of texts through sampling and decoding, and calculating the reward score and logarithmic probability of each text; calculating probability ranking loss according to the ranking of the reward scores of each text and the logarithmic probability; determining a model optimization target according to the probability sorting loss; and performing secondary training on the intermediate large language model according to the model optimization target to obtain a target text generation model.

For example, input any sample instruction x in the second model alignment data to the LLM _sft Generating m different texts by sample decodingThe reward score of each text is then calculated using a rule-based reward calculation method, the calculation formula of the reward score being shown in equation 1:

r _i ＝-(ReLU(L _min -L _g )+ReLU(L _g -L _max ) Equation 1;

wherein r is _i Representing the generated ith textIs a bonus score of (2); l (L) _g Representing the actual length of the ith text; l (L) _min Representing a length control lower threshold; l (L) _max Representing the length control upper limit threshold.

Each text is then calculated using equation 2Length normalized log probability p _i And calculating the probability sequencing loss by using a formula 3 according to the sequencing of the reward scores of each text and the logarithmic probability.

Wherein p is _i Representing the log probability of the generated ith text through length normalization;representing the generated ith text; />Representing the actual length of the generated ith text; p (P) _θ (. |.) indicates LLM in decoding process _sft Probability distribution of the character at each instant; θ represents LLM _sft Is a trainable parameter of (a); />Represents the generated i text ++>T-th character of (a);represents the generated i text ++>1 st to t-1 st character in (a); loss (Low Density) _rank Representing a probability ordering penalty; r is (r) _i A bonus score representing the generated ith text; r is (r) _j A bonus score representing the j-th text generated; p is p _j Representing the log probability of the generated j text subjected to length normalization; sigma represents a superparameter for controlling p _i And p _j The size of the space between them.

To ensure consistency of training objectives with inference objectives, cross entropy loss for the target text (i.e., real text) y is increased in embodiments of the present application. Specifically, calculating cross entropy loss between each text relative to a target text corresponding to any sample instruction; the probabilistic ordering penalty and the cross entropy penalty are then combined to determine a model optimization objective. For example, in the joint process, the probability sorting loss and the cross entropy loss can be weighted by using a preset balance coefficient to obtain a model optimization target.

The calculation formula of the cross entropy loss is shown in formula 4:

wherein, loss _ce Representing cross entropy loss; the y represents the actual length of the real text y; y is _t Representing the t-th character in the real text y; y is _i,＜t Representing the 1 st to t-1 st characters in the real text y.

The calculation formula of the model optimization target is shown in formula 5:

Loss＝Loss _ce +α·Loss _rank equation 5;

wherein α represents a preset balance coefficient for balancing the influence of the probability ordering loss and the cross entropy loss on the model.

And training the middle big language model for the second time according to the model optimization target, performing back propagation (back propagation) operation according to the model optimization target, and updating the weight parameters of the middle big language model through the operation result. And (5) circulating iterative processing until the iteration ending condition is met, and obtaining the target text generation model. The iteration end condition may include: the iteration times reach an iteration times threshold value; and/or the output value of the model optimization objective is less than the loss threshold. Whether the iteration end condition is met can be judged by judging whether the iteration number reaches the iteration number threshold, or whether the iteration end condition is met can be judged according to whether the output value of the model optimization target is smaller than the loss threshold. And stopping the iterative process after the iteration ending condition is met, so that the target text generation model is obtained.

After the training of the two stages, a target text generation model, namely a trained and final text generation model with controllable generated text length, can be obtained, so that a user can conveniently realize the length control of the text generated by the model through a length control instruction when the target text generation model is applied to text generation.

According to the method for controlling the length of the generated text based on the large language model, which is provided by the embodiment of the application, through counting the text length of the target text corresponding to each generated instruction in the marked first model alignment data, the corresponding sample instruction containing the length control instruction can be conveniently constructed for each generated instruction, the automatic construction of the second model alignment data with the length control instruction in a large scale is conveniently and efficiently realized, and sufficient sample data support is provided for training the generation of the target text generation model with the controllable text length; in addition, a large language model gives higher probability for the generated text conforming to the length control instruction and lower probability for the text not conforming to the length control instruction in a probability ordering mode, so that the length controllability of the trained target text generation model on the generated text is effectively improved; compared with the prior art that 4 online models are required to be simultaneously operated in a training stage based on a reinforcement learning mode, the method has the advantages that only 1 online model is required to be operated in the training stage, and the consumption of computing resources is obviously reduced; compared with reinforcement learning, the method and the device can also improve the training speed of the model, thereby accelerating the iteration speed of the product; in addition, the target text generation model obtained by the scheme can well follow the length control instruction to generate a coherent and semantic-rich text, can meet the requirements of diversified user requirements and business scenes, has strong universality, and can be well applied to different generation tasks and business scenes, such as automatic abstract, text generation and the like.

FIG. 3 shows a block diagram of a large language model based generated text length control apparatus according to one embodiment of the present application, as shown in FIG. 3, comprising: acquisition module 310, construction module 320, and training module 330.

The acquisition module 310 is adapted to: acquiring marked first model alignment data; the first model alignment data comprises a plurality of generation instructions and target texts corresponding to the generation instructions.

The construction module 320 is adapted to: constructing second model alignment data according to each generation instruction in the first model alignment data and a target text corresponding to each generation instruction; the second model alignment data comprises a plurality of sample instructions containing length control instructions and target texts corresponding to the plurality of sample instructions.

The training module 330 is adapted to: and training the large language model based on the probability ordering mode by using the second model alignment data to obtain a target text generation model.

Optionally, the construction module 320 is further adapted to: aiming at each generation instruction in the first model alignment data, constructing a corresponding sample instruction containing a length control instruction for the generation instruction, and taking a target text corresponding to the generation instruction as a target text corresponding to the sample instruction; and summarizing the plurality of sample instructions and target texts corresponding to the plurality of sample instructions to form second model alignment data.

Optionally, the construction module 320 is further adapted to: counting the text length of a target text corresponding to the generation instruction; generating a length control instruction containing a length control threshold according to the text length; and splicing the length control instruction and the generation instruction to construct a sample instruction corresponding to the generation instruction.

Optionally, the training module 330 is further adapted to: performing large language model pre-training by using second model alignment data and adopting an instruction fine tuning mode to obtain an intermediate large language model; and performing secondary training on the intermediate large language model based on the probability ordering mode by using the second model alignment data to obtain a target text generation model.

Optionally, the training module 330 is further adapted to: inputting any sample instruction in the second model alignment data into the intermediate large language model, generating a plurality of texts through sampling and decoding, and calculating the rewarding score and logarithmic probability of each text; calculating probability ranking loss according to the ranking of the reward scores of each text and the logarithmic probability; determining a model optimization target according to the probability sorting loss; and performing secondary training on the intermediate large language model according to the model optimization target to obtain a target text generation model.

Optionally, the training module 330 is further adapted to: calculating cross entropy loss between each text and the target text corresponding to any sample instruction; and combining the probability sorting loss and the cross entropy loss, and determining a model optimization target.

Optionally, the training module 330 is further adapted to: and weighting the probability sorting loss and the cross entropy loss by using a preset balance coefficient to obtain a model optimization target.

The above descriptions of the modules refer to the corresponding descriptions in the method embodiments, and are not repeated herein.

According to the large language model-based generated text length control device provided by the embodiment of the application, through counting the text length of the target text corresponding to each generated instruction in the marked first model alignment data, the corresponding sample instruction containing the length control instruction can be conveniently constructed for each generated instruction, the automatic construction of the second model alignment data with the length control instruction on a large scale is conveniently and efficiently realized, and sufficient sample data support is provided for training the generated text length-controllable target text generation model; in addition, a large language model gives higher probability for the generated text conforming to the length control instruction and lower probability for the text not conforming to the length control instruction in a probability ordering mode, so that the length controllability of the trained target text generation model on the generated text is effectively improved; compared with the prior art that 4 online models are required to be simultaneously operated in a training stage based on a reinforcement learning mode, the method has the advantages that only 1 online model is required to be operated in the training stage, and the consumption of computing resources is obviously reduced; compared with reinforcement learning, the method and the device can also improve the training speed of the model, thereby accelerating the iteration speed of the product; in addition, the target text generation model obtained by the scheme can well follow the length control instruction to generate a coherent and semantic-rich text, can meet the requirements of diversified user requirements and business scenes, has strong universality, and can be well applied to different generation tasks and business scenes, such as automatic abstract, text generation and the like.

The embodiment of the application also provides a non-volatile computer storage medium, and the computer storage medium stores at least one executable instruction, wherein the executable instruction can execute the method for controlling the length of the generated text based on the large language model in any of the method embodiments.

FIG. 4 illustrates a schematic diagram of a computing device, according to one embodiment of the application, the particular embodiments of which are not limiting on the particular implementation of the computing device.

As shown in fig. 4, the computing device may include: a processor 402, a communication interface (Communications Interface) 404, a memory 406, and a communication bus 408.

Wherein:

processor 402, communication interface 404, and memory 406 communicate with each other via communication bus 408.

A communication interface 404 for communicating with network elements of other devices, such as clients or other servers.

Processor 402 is configured to execute program 410, and may specifically perform relevant steps in the foregoing embodiment of the method for generating text length control based on a large language model.

In particular, program 410 may include program code including computer-operating instructions.

The processor 402 may be a central processing unit CPU, or a specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present application. The one or more processors included by the computing device may be the same type of processor, such as one or more CPUs; but may also be different types of processors such as one or more CPUs and one or more ASICs.

Memory 406 for storing programs 410. Memory 406 may comprise high-speed RAM memory or may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

Program 410 may be specifically operative to cause processor 402 to perform a large language model based generated text length control method in any of the method embodiments described above. Specific implementation of each step in the procedure 410 may refer to corresponding descriptions in the corresponding steps and units in the above embodiment of text length control based on large language model generation, which is not repeated here. It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the apparatus and modules described above may refer to corresponding procedure descriptions in the foregoing method embodiments, which are not repeated herein.

The algorithms and displays presented herein are not inherently related to any particular computer, virtual system, or other apparatus. Various general-purpose systems may also be used with the teachings herein. The required structure for a construction of such a system is apparent from the description above. In addition, embodiments of the present application are not directed to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the embodiments of the present application as described herein, and the above description of specific languages is provided for disclosure of enablement and best mode of the embodiments of the present application.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the above description of exemplary embodiments of the application, various features of embodiments of the application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be construed as reflecting the intention that: i.e., the claimed embodiments of the application claim more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application embodiment.

Those skilled in the art will appreciate that the modules in the apparatus of the embodiments may be adaptively changed and disposed in one or more apparatuses different from the embodiments. The modules or units or components of the embodiments may be combined into one module or unit or component and, furthermore, they may be divided into a plurality of sub-modules or sub-units or sub-components. Any combination of all features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or units of any method or apparatus so disclosed, may be used in combination, except insofar as at least some of such features and/or processes or units are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of embodiments of the present application and form different embodiments. For example, in the following claims, any of the claimed embodiments can be used in any combination.

The various component embodiments of the present embodiments may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that some or all of the functionality of some or all of the components according to embodiments of the present application may be implemented in practice using a microprocessor or Digital Signal Processor (DSP). Embodiments of the present application may also be implemented as a device or apparatus program (e.g., a computer program and a computer program product) for performing a portion or all of the methods described herein. Such a program embodying the embodiments of the present application may be stored on a computer readable medium, or may have the form of one or more signals. Such signals may be downloaded from an internet website, provided on a carrier signal, or provided in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the embodiments of the application, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The embodiments of the application may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names.

Claims

1. A method for controlling the length of a generated text based on a large language model, comprising:

and training the large language model based on a probability ordering mode by utilizing the second model alignment data to obtain a target text generation model.

2. The method of claim 1, wherein constructing second model alignment data from each generation instruction in the first model alignment data and the target text corresponding to each generation instruction further comprises:

aiming at each generation instruction in the first model alignment data, constructing a corresponding sample instruction containing a length control instruction for the generation instruction, and taking a target text corresponding to the generation instruction as a target text corresponding to the sample instruction;

and summarizing the plurality of sample instructions and target texts corresponding to the plurality of sample instructions to form the second model alignment data.

3. The method of claim 2, wherein constructing a corresponding sample instruction for the generated instruction that includes a length control instruction further comprises:

counting the text length of a target text corresponding to the generation instruction;

generating a length control instruction containing a length control threshold according to the text length;

and splicing the length control instruction and the generation instruction to construct a sample instruction corresponding to the generation instruction.

4. A method according to any one of claims 1-3, wherein using the second model alignment data to perform large language model training based on a probability ordering manner, the obtaining a target text generation model further comprises:

performing large language model pre-training by using the second model alignment data in an instruction fine tuning mode to obtain an intermediate large language model;

and performing secondary training on the intermediate large language model based on a probability ordering mode by using the second model alignment data to obtain the target text generation model.

5. The method of claim 4, wherein using the second model alignment data to secondarily train the intermediate large language model based on a probability ordering manner, the obtaining the target text generation model further comprises:

inputting any sample instruction in the second model alignment data into the middle large language model, generating a plurality of texts through sampling and decoding, and calculating the rewarding score and the logarithmic probability of each text;

calculating probability ranking loss according to the ranking of the reward scores of each text and the logarithmic probability;

determining a model optimization target according to the probability sorting loss;

and training the intermediate large language model for the second time according to the model optimization target to obtain the target text generation model.

6. The method of claim 5, wherein determining a model optimization objective based on the probability ordering penalty further comprises:

calculating cross entropy loss between each text and the target text corresponding to any sample instruction;

and determining the model optimization target by combining the probability ordering loss and the cross entropy loss.

7. The method of claim 6, wherein the combining the probability ordering loss and the cross entropy loss, determining the model optimization objective further comprises:

and weighting the probability sorting loss and the cross entropy loss by using a preset balance coefficient to obtain the model optimization target.

8. A large language model-based generated text length control apparatus, comprising:

9. A computing device, comprising: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus;

the memory is configured to store at least one executable instruction that causes the processor to perform the operations corresponding to the large language model-based generated text length control method according to any one of claims 1 to 7.

10. A computer storage medium having stored therein at least one executable instruction for causing a processor to perform operations corresponding to the large language model based generated text length control method of any one of claims 1-7.