CN116432608A

CN116432608A - Text generation method and device based on artificial intelligence, computer equipment and medium

Info

Publication number: CN116432608A
Application number: CN202310373410.XA
Authority: CN
Inventors: 潘荣峰; 王健宗
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2023-04-06
Filing date: 2023-04-06
Publication date: 2023-07-14

Abstract

The present invention relates to the field of artificial intelligence technologies, and in particular, to a text generating method, apparatus, computer device, and medium based on artificial intelligence. According to the method, reference information, a text sample and a label thereof are input into a text generation model, iteration target iteration times are carried out to obtain a generation sample, a loss function value is calculated according to the generation sample, when the loss function value meets a preset condition, the target iteration times are corrected to be the reference iteration times, the reference sample is obtained by re-iterating according to the reference iteration times, a pre-trained text generation model is trained according to the reference sample and the label, a text to be converted is input into the trained text generation model, the generated text is obtained, when the relation between the unknown iteration times and the accuracy of the text generation model is obtained, the generation quantity is corrected according to known information, the corrected generation quantity is used for training the text generation model, and the accuracy of the trained text generation model is improved.

Description

Text generation method and device based on artificial intelligence, computer equipment and medium

Technical Field

The present invention relates to the field of artificial intelligence technologies, and in particular, to a text generating method, apparatus, computer device, and medium based on artificial intelligence.

Background

The text generation model is a language model that handles text-to-text conversion tasks, i.e., converting input text to output text. For a text generation model, the length of the output text is typically determined at a preset text length during the conversion process.

However, a fixed text length may result in an output text length that is too lengthy or too brief to accurately express the semantics of the input text, resulting in a lower accuracy of the text generation model. The existing solutions are to add some statistical methods, such as maximum edge correlation algorithm, similarity calculation, etc., in the conversion process to alleviate such problems, but these methods all perform addition and deletion operations on terms by counting the degree of correlation between terms in the output text, and although the effect of controlling the length of the output text can be achieved, the manner of deleting the terms can cause the finally generated text to be incoherent, and also cause the accuracy of the text generation model to be lower. Therefore, how to improve the text generation accuracy becomes a problem to be solved.

Disclosure of Invention

In view of the above, the embodiments of the present invention provide a text generation method, apparatus, computer device and medium based on artificial intelligence, so as to solve the problem of low text generation accuracy.

In a first aspect, an embodiment of the present invention provides an artificial intelligence based text generation method, where the text generation method includes:

inputting the reference information, the text sample and the label thereof into a pre-trained text generation model to obtain a generated term, increasing the iteration number by one, and updating the reference information by using the generated term;

the step of inputting the reference information, the text sample and the label thereof into a pre-trained text generation model is carried out, generated terms are obtained, the iteration number is increased by one, iteration is repeated until the iteration number is the same as the preset target iteration number, a generated sample is obtained, the generated sample comprises M generated terms, and M is the target iteration number;

calculating a loss function value according to the generated sample, the label and a preset loss function, and correcting the target iteration number when the loss function value is larger than a preset threshold value to obtain a reference iteration number;

when the reference iteration number meets a preset condition, returning to execute the step of inputting the reference information, the text sample and the label thereof into the pre-trained text generation model to obtain a generated term, and repeating iteration until the iteration number is the same as the reference iteration number to obtain a reference sample;

Training the pre-trained text generation model according to the reference sample and the label to obtain a trained text generation model, and inputting the acquired text to be converted into the trained text generation model to obtain a generated text.

In a second aspect, an embodiment of the present invention provides an artificial intelligence based text generating apparatus, including:

the term generation module is used for inputting the reference information, the text sample and the label thereof into a pre-trained text generation model to obtain a generated term, increasing the iteration number by one, and updating the reference information by using the generated term;

the term iteration module is used for returning and executing the text generation model which inputs the reference information, the text sample and the label thereof into the pre-trained text to obtain a generated term, repeating iteration until the iteration number is the same as the preset target iteration number to obtain a generated sample, wherein the generated sample comprises M generated terms, and M is the target iteration number;

the frequency correction module is used for calculating a loss function value according to the generated sample, the label and a preset loss function, and correcting the target iteration frequency when the loss function value is larger than a preset threshold value to obtain a reference iteration frequency;

The sample generation module is used for returning to execute the step of inputting the reference information, the text sample and the label thereof into the pre-trained text generation model to obtain a generated term when the reference iteration number meets the preset condition, and repeating the step of increasing the iteration number by one until the iteration number is the same as the reference iteration number to obtain a reference sample;

the text generation module is used for training the pre-trained text generation model according to the reference sample and the label to obtain a trained text generation model, and inputting the acquired text to be converted into the trained text generation model to obtain a generated text.

In a third aspect, an embodiment of the present invention provides a computer device, the computer device comprising a processor, a memory, and a computer program stored in the memory and executable on the processor, the processor implementing the text generation method according to the first aspect when executing the computer program.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the text generation method according to the first aspect.

Compared with the prior art, the embodiment of the invention has the beneficial effects that:

inputting the reference information, the text sample and the label thereof into a pre-trained text generation model to obtain a generated term, increasing the iteration number by one, updating the reference information by using the generated term, returning and executing the steps of inputting the reference information, the text sample and the label thereof into the pre-trained text generation model to obtain the generated term, increasing the iteration number by one, repeating the iteration until the iteration number is the same as the preset target iteration number to obtain a generated sample, calculating according to the generated sample, the label and the preset loss function, obtaining a loss function value, correcting the target iteration number to obtain the reference iteration number when the loss function value is larger than a preset threshold value, returning and executing the steps of inputting the reference information, the text sample and the label thereof into the pre-trained text generation model to obtain the generated term when the reference iteration number meets a preset condition, repeating the iteration number until the iteration number is the same as the reference iteration number, obtaining the reference sample, training the pre-trained text generation model according to the reference sample and the label, inputting the obtained text to be converted into the trained text generation model to obtain the generated text, establishing an association relation between the iteration number and the preset iteration number to correct the known iteration number when the reference iteration number meets the preset iteration number is greater than the preset iteration number, and the preset iteration number is equal to the reference iteration number, and the training function value is performed again, and the relation is determined to be accurate when the relation between the obtained is calculated to the iteration number and the reference iteration number is greater than the preset, and the iteration number is correct.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic view of an application environment of an artificial intelligence based text generation method according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of an artificial intelligence based text generation method according to an embodiment of the present invention;

FIG. 3 is a schematic flow chart of a text generation method based on artificial intelligence according to a second embodiment of the present invention;

fig. 4 is a schematic structural diagram of an artificial intelligence based text generating device according to a third embodiment of the present invention;

fig. 5 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

As used in the present description and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".

Furthermore, the terms "first," "second," "third," and the like in the description of the present specification and in the appended claims, are used for distinguishing between descriptions and not necessarily for indicating or implying a relative importance.

Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the invention. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

The embodiment of the invention can acquire and process the related data based on the artificial intelligence technology. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.

Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

It should be understood that the sequence numbers of the steps in the following embodiments do not mean the order of execution, and the execution order of the processes should be determined by the functions and the internal logic, and should not be construed as limiting the implementation process of the embodiments of the present invention.

In order to illustrate the technical scheme of the invention, the following description is made by specific examples.

The text generation method based on artificial intelligence provided by the embodiment of the invention can be applied to an application environment as shown in fig. 1, wherein a client communicates with a server. The client includes, but is not limited to, a palm top computer, a desktop computer, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a cloud terminal device, a personal digital assistant (personal digital assistant, PDA), and other computer devices. The server may be implemented by a stand-alone server or a server cluster formed by a plurality of servers.

Referring to fig. 2, a flow chart of an artificial intelligence-based text generation method according to an embodiment of the present invention is provided, where the above text generation method may be applied to a client in fig. 1, a pre-trained text generation model is deployed in a computer device corresponding to the client, where the pre-trained text generation model may implement a basic text generation function, the computer device corresponding to the client is connected to a server to obtain stored reference information, a text sample and a tag thereof from the server, train the pre-trained text generation model in the client, and obtain a text to be converted from the server, where the text to be converted may refer to a text sent by a user to the server and requesting text generation processing. As shown in fig. 2, the text generation method may include the steps of:

step S201, inputting the reference information, the text sample and the label thereof into a pre-trained text generation model to obtain a generated term, increasing the iteration times by one, and updating the reference information by using the generated term.

The text sample may refer to a sample that needs to be generated by a text, the tag may refer to a tag text into which the text sample needs to be converted, the reference information may refer to embedded information of a text generation model, the embedded information may include generated term information, the generated term may refer to terms in the generated text, and the iteration number may refer to the number of times of generating terms iteratively.

The pre-trained text generation model may include a pre-trained encoder that may be used to extract text features of the text samples and a pre-trained decoder that may be used to reconstruct generated terms based on the text features and the embedded information.

Specifically, a text sample is input into a pre-trained encoder to perform feature extraction to obtain a sample feature, the text sample is generally input according to a batch, in this embodiment, for convenience of description, a single text sample is used as input information, reference information and the sample feature are input into the pre-trained decoder to perform feature reconstruction to obtain a generated term, the generated term is a term at a position in the generated text, the generated term is updated to the reference information, in this embodiment, the updating mode may refer to connection, that is, the generated term is directly updated to the end of the reference information, the iteration number is increased by one, and numerical variation of the iteration number may be realized through an accumulator.

Optionally, the reference information includes M zero elements;

updating the reference information using the generated term includes:

determining the position of the forefront zero element in the reference information as a target position;

And generating a zero element of the term replacement target position to obtain updated reference information.

The reference information may be represented as a 1*M-sized vector, i.e., a row of M columns of vectors, where the reference information includes M elements, the M is the same as the target iteration number, and in an initial state of the reference information, all elements are 0.

The zero element may refer to an element having an element value of 0, and the target location may refer to a location where the element to be updated is located.

Specifically, the position of the zero element may be represented by coordinates (row, column), row defaults to 1, the front and rear of the position may be determined by the size of column, the smaller column, the more front the position of the zero element, for example, the first zero element position is (1, c 1), the second zero element position is (1, c 2), and c1 is less than c2, the position of the first zero element is before the position of the second zero element.

The position of the forefront zero element is the coordinate with the smallest column in all position coordinates, and the replacement can be to replace the generated term with the element value 0 of the zero element, for example, if the generated term is you and the reference information is [0, 0], the target position is the position of the first element in the reference information, and replace the generated term you with the target position by 0, so as to obtain updated reference information as [ you, 0].

In this embodiment, the reference information with a fixed size is used as the embedded information, so as to ensure that the dimension of the embedded information is fixed, so that the encoder can perform feature integration conveniently, the situation of information loss is avoided, and the accuracy of text generation is improved.

The method comprises the steps of inputting the reference information, the text sample and the label thereof into the pre-trained text generation model to obtain the generated term, increasing the iteration times by one, and updating the reference information by using the generated term.

Step S202, the step of inputting the reference information, the text sample and the label thereof into the pre-trained text generation model is returned to obtain the generated term, the step of increasing the iteration number by one is repeated until the iteration number is the same as the preset target iteration number, and the generated sample is obtained.

The generating samples comprise M generating terms, M is the target iteration times, and the returning execution can be that the updated reference information, the text samples and the labels thereof are input into a pre-trained text generating model again to obtain new generating terms.

The initial value of the iteration number is 0, after each step of inputting the reference information, the text sample and the label thereof into the pre-trained text generation model to obtain the generated term is executed, the iteration number is increased by one, and the iteration can refer to the repeated generation process of the generated term.

Specifically, when iteration generation is performed according to different target iteration times, the lengths of the finally output generated texts are different due to different target iteration times, and the control of the lengths of the generated texts also causes the difference in the text generation process, namely, when the encoder performs feature reconstruction, the reconstructed generated terms are also different due to the fact that the reference information is introduced, namely, the accuracy of text generation is affected by the different target iteration times.

And the step of inputting the reference information, the text sample and the label thereof into the pre-trained text generation model to obtain the generated term, and repeating the steps of increasing the iteration times by one until the iteration times are the same as the preset target iteration times to obtain the generated sample, and the step of obtaining the generated sample with the same length as the target iteration times.

Step S203, according to the generated sample, the label and the preset loss function, calculating to obtain a loss function value, and when the loss function value is larger than the preset threshold, correcting the target iteration times to obtain the reference iteration times.

The loss function can be used for measuring the similarity between the generated sample and the label, the loss function can be a mean square error loss function, a Euclidean distance loss function and the like, and the preset threshold can be used for measuring whether the loss function value obtained by the current target iteration number is small enough or not so as to be capable of retraining the pre-trained text generation model based on the loss function value corresponding to the current target iteration number.

Specifically, when the loss function value obtained by the current target iteration number is small enough, the generated sample with the same length as the current target iteration number is close to the label, and the generated sample belongs to a better generated sample, and retraining is performed on the pre-trained text generated model according to the better generated sample and the loss function obtained by the label calculation, so that the fine tuning effect can be achieved, and the generation accuracy of the trained text generated model is further improved.

In one embodiment, in order to improve accuracy of the loss function calculation, a trained semantic feature encoder may be used to extract features of the generated text and the label, perform similarity calculation on the extracted features, and use a similarity calculation result as a loss function value, so as to avoid situations that text lengths are different but text semantics are the same, and further improve accuracy of the loss function calculation.

Optionally, the target iteration number includes N preset values, the loss function value includes N function values corresponding to the preset values, and N is an integer greater than zero;

correspondingly, when the loss function value is greater than the preset threshold, correcting the target iteration number includes:

when all the function values are larger than a preset threshold value, correcting the preset value corresponding to the minimum value in the N function values corresponding to the preset value;

after the loss function value is calculated, the method further comprises:

carrying out Gaussian kernel function calculation on N preset values, and determining a calculation result as a covariance matrix;

and constructing a joint Gaussian distribution according to the covariance matrix and a preset mean value vector.

The preset values may refer to different values of the target iteration number, and in this embodiment, the number of preset values is N, where N is an integer greater than zero.

The gaussian kernel function may be used to calculate the similarity between different preset values, the covariance matrix may refer to a parameter of a gaussian distribution, and similarly, the mean vector is also a parameter of the gaussian distribution, the joint gaussian distribution may refer to a distribution relationship established based on known N preset values and N function values corresponding to the preset values, loss function values corresponding to each preset value obey the joint gaussian distribution, and the joint gaussian distribution is used to determine a reference function value corresponding to the reference iteration number later.

In particular, the gaussian kernel function can be expressed as

Wherein x is _i May refer to the i < th > preset value, x _j May refer to the j-th preset value, ||x _i -x _j || ² Can be the Euclidean distance between the ith preset value and the jth preset value, lambda is an superparameter, and the values of 1, i and j can be set to be 1,2, … and N]When i and j are equal, defining the corresponding K value as 1, when ||x _i -x _j || ² Toward infinity, the corresponding K value is defined as 0.

For example, assuming N is 3, the covariance matrix may be expressed as

Since N preset values are known, each K value can be calculated separately, so that a covariance matrix can be determined, i.e., the covariance matrix is known by default, and the preset mean vector can be expressed as [0,0 ]] ^T In practice, the elements of the mean vector may take any value, in order to facilitate calculation, the element value is set to 0, the number of elements in the mean vector is consistent with the number of preset values, that is, the mean vector contains N elements, and then the joint gaussian distribution may be expressed as:

wherein F is _n Can be the function value corresponding to the n preset value, and the value range of n is [1, N]Integers in the range and the covariance matrix is a semi-positive definite symmetric matrix, i.e. K _ij ＝K _ji 。

According to the embodiment, the preset value and the corresponding function value are modeled through the Gaussian kernel function to obtain the combined Gaussian distribution, so that a modeling result is obtained under the condition that only a few corresponding relations between the preset value and the function value are known, the reference function value corresponding to the reference iteration number is convenient to determine subsequently, and the searching efficiency of the optimal iteration number is improved.

Optionally, after constructing the joint gaussian distribution, the method further includes:

carrying out Gaussian kernel function calculation on the reference iteration times and N preset values respectively to obtain covariance vectors;

calculating to obtain function value distribution corresponding to the reference iteration times according to covariance vectors, covariance matrixes and N preset values and based on the marginalization characteristic of the joint Gaussian distribution;

and sampling the function value distribution, and determining a sampling result as a reference function value.

The covariance vector may be a vector formed by similarity between the reference iteration number and N preset values, and the marginalization characteristic (marginalization property) may be that new data does not affect the distribution of the existing data, where the data may be the iteration number.

Specifically, assuming that the reference function value and the function values of the N corresponding initial values are identical to a joint gaussian distribution belonging to one n+1 dimension, taking the above example, the N takes a value of 3, that is, the joint gaussian distribution of n+1 dimension can be expressed as:

Wherein F is ₄ Can refer to a reference function value, K ₄ ＝[K ₁₄ ,K ₂₄ ,K ₃₄ ] ^T May be referred to as covariance vectors, since both the reference iteration number and the N number of initial generations are known, the covariance vectors may be calculated from the gaussian kernel function, and the n+1-dimensional joint gaussian distribution is known by default.

Further, according to the marginalization characteristic, the formula:

can calculate F ₄ I.e., a distribution of function values with reference to function values, wherein,

may be the pointing quantity [ F ₁ ,F ₂ ,F ₃ ]，K ₄₄ Constant 1, K may refer to covariance matrix +.>

And carrying out probability sampling after normalizing the obtained function value distribution, and determining the function value corresponding to the obtained sampling result as a reference function value.

According to the method and the device, the function value distribution of the reference iteration times is obtained through calculation by combining the marginalization characteristics of Gaussian distribution, and the reference function value is determined, so that the reference function value corresponding to the reference iteration times is solved under the condition that the relation between the iteration times and the function value is not known exactly, the determination of the optimal iteration times is convenient to follow, the preparation work for correcting the iteration times is reduced, and calculation resources are saved.

Optionally, the process of detecting that the reference iteration number meets the preset condition includes:

determining the minimum value of the N function values corresponding to the preset values as a temporary function value, and determining the preset value corresponding to the temporary function value as temporary iteration times;

Calculating the mean value and standard deviation of N function values corresponding to the preset value, calculating a first difference value between the reference function value and the mean value, and determining the ratio of the first difference value to the standard deviation as a first normalization value;

calculating a second difference value of the temporary function value and the mean value, determining the ratio of the second difference value to the standard deviation as a second normalized value, and calculating a subtraction result of the second normalized value and the first normalized value;

if the subtraction result is larger than a preset subtraction threshold value, determining that the reference iteration number meets a preset condition.

The temporary iteration times can be optimal iteration times in the current known information, the temporary function values are function values corresponding to the temporary iteration times, the mean value and the standard deviation can be used for normalization when the function values are compared, the purpose of normalization is to normalize the function values under the condition that the value range is unknown, the function values are unified in scale, and comparison between the function values is facilitated.

Specifically, since the value range of the function value is unknown, the comparison scale is also unknown, for example, if the difference between the reference function value and a function value is 1, if the value range of the function value is [0, 1000], the two function values are very similar, retraining of the text generating model which is pretrained by the reference function value is not needed, because the training effect is similar to the retraining effect by the temporary function value, a better iteration number needs to be searched, but if the value range of the function value is [0,2], the two function values are dissimilar, at this time, if the subtraction result is larger than the preset subtraction threshold value, the training effect of the reference function value is greatly improved compared with the retraining effect by the temporary function value, and based on the above conditions, normalization processing is needed when the reference function value is compared with the temporary function value, and the preset subtraction threshold value can be determined based on the ratio of the difference value between the maximum value and the minimum value of the N corresponding preset values to N.

In the embodiment, the function values are normalized and compared, and erroneous judgment of comparison results caused by unknown scales is avoided, so that the accuracy of searching the optimal iteration times is improved.

Optionally, when all the function values are greater than the preset threshold, correcting the preset value corresponding to the minimum value in the N function values corresponding to the preset value includes:

determining a preset value corresponding to the minimum value in the N function values corresponding to the preset values as a value to be corrected, calculating the ratio of the preset coefficient to N, and taking the ratio as a correction amount;

counting the number of preset values larger than the value to be corrected in the N preset values to obtain a first counted value, and counting the number of preset values smaller than the value to be corrected to obtain a second counted value;

normalizing the first statistic value and the second statistic value to obtain a first probability and a second probability, and performing probability sampling on the positive sign and the negative sign according to the first probability and the second probability, and updating the correction quantity according to the sampling result to obtain a correction value;

and adding the value to be corrected and the correction value, and taking the addition result as the reference iteration number.

The to-be-corrected value may refer to a basic value to be corrected, the correction value may refer to a modulus of the correction value for correcting the to-be-corrected value, the first statistical value may refer to the number of function values smaller than the to-be-corrected value among all function values, the second statistical value may refer to the number of function values larger than the to-be-corrected value among all function values, and a sum of the first statistical value and the second statistical value should be N.

The first probability may refer to a sampling probability of a positive sign, the second probability may refer to a sampling probability of a negative sign, the sampling result is one of the positive sign and the negative sign, the update correction amount may refer to a symbol with the sampling result as the correction amount, and the correction value may refer to a value for correction to the correction value.

In this embodiment, the correction value of the to-be-corrected value is determined according to the current known information, so as to obtain the reference iteration number, and further improve the efficiency of searching for the optimal iteration number.

Optionally, after obtaining the reference iteration number, the method further includes:

when the reference iteration number does not meet the preset condition, calculating a reference loss value according to the reference sample, the label and the loss function;

and taking the reference iteration times and the reference loss values as a preset value and the corresponding function values thereof, updating the N values, and executing the step of correcting the preset value corresponding to the minimum value in the N function values corresponding to the preset values again.

And when the preset value number N is updated for a single time, updating the preset value number N to be N+1, namely, obtaining the updated preset value number N by assigning n=N+1.

Specifically, after the preset value is updated, correcting the preset value corresponding to the minimum value in the function values corresponding to the NN preset values to obtain the reference iteration times, and at the moment, recalculating the joint Gaussian distribution with the updated preset value to obtain the new reference function value corresponding to the reference iteration times.

According to the method and the device, the reference iteration times and the reference function values which do not meet the preset conditions are used as known information, so that the searching process of the optimal reference iteration times can be optimized, meanwhile, the modeling accuracy is improved, the more accurate reference function values are obtained, and therefore the searching efficiency of the optimal reference iteration times is improved.

And step S204, when the reference iteration number meets the preset condition, returning to execute the step of inputting the reference information, the text sample and the label thereof into the pre-trained text generation model to obtain the generated term, and repeating the iteration number by one step until the iteration number is the same as the reference iteration number to obtain the reference sample.

Wherein the reference sample comprises M ^′ Generating term M ^′ And determining the number of generated terms generated by the text generation model for the reference iteration number, namely adopting the reference iteration number.

When the reference iteration number meets the preset condition, the step of inputting the reference information, the text sample and the label thereof into the pre-trained text generation model is performed in a returning mode to obtain the generated term, the step of increasing the iteration number by one is repeated until the iteration number is the same as the reference iteration number, the step of obtaining the reference sample is performed, the text generation is performed according to the reference iteration number meeting the preset condition, at the moment, the reference iteration number can enable the generation accuracy of the text generation model to be higher, the retraining of the pre-trained text generation model can be performed based on the generated reference sample better, and the fitting accuracy of the model is improved.

Step S205, training the pre-trained text generation model according to the reference sample and the label to obtain a trained text generation model, and inputting the acquired text to be converted into the trained text generation model to obtain a generated text.

The text to be converted may refer to an original text that needs to be generated, and the generated text may refer to an output result obtained after the text to be converted is processed by a trained text generation model.

Specifically, the trained text generation model can be applied to various application scenes such as translation, man-machine interaction, text standardization and the like.

According to the method, training is carried out on the pre-trained text generation model according to the reference sample and the label to obtain the trained text generation model, the obtained text to be converted is input into the trained text generation model to obtain the generated text, the generated text with high accuracy is obtained through the trained text generation model, and the accuracy of a text generation task in an application scene is improved.

According to the method and the device for training the text generation model, the association relation between the iteration times and the loss function value is established, the reference iteration times meeting the preset conditions are determined based on the known target iteration times and the loss function value, the reference sample is obtained through re-iteration of the reference iteration times for model training, and therefore the text generation model can be trained according to the known information correction iteration times when the relation between the unknown iteration times and the text generation model accuracy is achieved, and the accuracy of the trained model for generating the text is improved.

Referring to fig. 3, a flow chart of a text generation method based on artificial intelligence according to a second embodiment of the present invention is shown, where the preset condition met by the reference iteration number may mean that the normalized value of the reference function value corresponding to the reference iteration number is smaller than the normalized value of the function value with the smallest known function value, or may mean that the normalized value of the reference function value is compared with other known function values, and the number of function values with the comparison result being greater than the reference function value is greater than the preset number threshold.

When the preset condition met by the reference iteration number means that the reference function value corresponding to the reference iteration number is smaller than the normalized value of the known minimum function value after normalization, the process of detecting that the objective function value meets the preset condition is referred to in embodiment one, and will not be described herein.

When the reference function value is compared with other known function values after normalization, and the comparison result is that the number of the function values larger than the reference function value is larger than the preset number threshold, the process of detecting that the objective function value meets the preset condition comprises the following steps:

step S301, subtracting preset parameters from N function values corresponding to preset values respectively to obtain N reference parameters;

Step S302, counting the number of the reference parameters which are larger than or equal to the reference function value in the N reference parameters to obtain a counting result;

step S303, when the ratio of the statistical result to N is detected to be larger than a preset quantity threshold, determining that the reference iteration number meets a preset condition.

Wherein N may refer to a count result of the number of initial values, the preset parameter may refer to an adjustment parameter, and the reference parameter may be used to compare with the reference function value to determine whether the reference function value is smaller than a majority of the function values.

Specifically, the value range of the preset parameter ε in this embodiment is [0, min (f) _n )]Wherein f _n Can be the function value corresponding to the n-th preset value, min (f _n ) Can be the function value corresponding to N preset valuesWhen the value of the preset parameter is close to 0, the reference function value needs to be smaller than the preset function value, when the value of the preset parameter is close to min (f _n ) In this embodiment, the preset parameter is set to 0.

The statistical result can be expressed as sum (f _n -ε>F) Wherein F can be a reference function value, sum is a statistical function, and the preset condition can be expressed as

Wherein, margin is a preset number threshold, and the value range of the preset threshold is [0,1 ]In this embodiment, margin is set to 0.8, and the practitioner can adjust the value of the preset number threshold according to the actual situation, and normalize the value with N as the denominator, where the preset number threshold is set by the nature of the preset condition, and the number of the preset number threshold is margin×n, when the number sum (f _n -ε>F) When the target function value is larger than a preset number of threshold values margin x N, the target function value is indicated to meet the preset condition.

In this embodiment, the number of function values larger than the reference function value is larger than the preset number threshold as a preset condition, and the reference function value is obtained based on the modeling result of the known information, which is not absolutely accurate, so that the reference iteration number can be effectively determined by the preset condition, the situation that the proper reference iteration number is difficult to search due to the fact that the preset condition is too limited is avoided, the feasibility of searching the reference iteration number is improved, and the efficiency of searching the reference iteration number is further improved.

Corresponding to the text generation method based on artificial intelligence in the above embodiment, fig. 4 shows a block diagram of a text generation device based on artificial intelligence in the third embodiment of the present invention, where the text generation device is applied to a client, a pre-trained text generation model is deployed in a computer device corresponding to the client, the pre-trained text generation model may implement a basic text generation function, the computer device corresponding to the client is connected to a server, so as to obtain stored reference information, a text sample and a tag thereof from the server, train the pre-trained text generation model in the client, and obtain a text to be converted from the server, where the text to be converted may refer to a text sent to the server by a user and requesting text generation processing. For convenience of explanation, only portions relevant to the embodiments of the present invention are shown.

Referring to fig. 4, the text generating apparatus includes:

the term generation module 41 is configured to input the reference information, the text sample and the tag thereof into a pre-trained text generation model to obtain a generated term, increase the iteration number by one, and update the reference information by using the generated term;

the term iteration module 42 is configured to return to perform a step of inputting the reference information, the text sample and the tag thereof into a pre-trained text generation model to obtain a generated term, and increasing the iteration number by one, and iterating until the iteration number is the same as the preset target iteration number, so as to obtain a generated sample, where the generated sample includes M generated terms, and M is the target iteration number;

the number correction module 43 is configured to calculate a loss function value according to the generated sample, the tag and a preset loss function, and correct the target iteration number when the loss function value is greater than a preset threshold value, so as to obtain a reference iteration number;

the sample generating module 44 is configured to return to perform inputting the reference information, the text sample, and the tag thereof into the pre-trained text generating model to obtain a generated term when the reference iteration number meets a preset condition, and repeat the iteration until the iteration number is the same as the reference iteration number, thereby obtaining the reference sample;

The text generating module 45 is configured to train the pre-trained text generating model according to the reference sample and the tag, obtain a trained text generating model, and input the obtained text to be converted into the trained text generating model, thereby obtaining a generated text.

Optionally, the reference information includes M zero elements;

the term generation module 41 includes:

a position determining unit for determining the position of the forefront zero element in the reference information as a target position;

and the element replacement unit is used for replacing the zero element of the target position with the generated term to obtain updated reference information.

accordingly, the number correction module 43 includes:

the preset value correction unit is used for correcting the preset value corresponding to the minimum value in the N function values corresponding to the preset value when all the function values are larger than the preset threshold value;

the text generation device further includes:

the function calculation module is used for carrying out Gaussian kernel function calculation on N preset values and determining a calculation result as a covariance matrix;

the distribution construction module is used for constructing a joint Gaussian distribution according to the covariance matrix and a preset mean value vector, and the joint Gaussian distribution is used for determining a reference function value corresponding to the reference iteration times.

Optionally, the text generating device further includes:

the vector calculation module is used for carrying out Gaussian kernel function calculation on the reference iteration times and N preset values respectively to obtain covariance vectors;

the distribution calculation module is used for calculating and obtaining function value distribution corresponding to the reference iteration times according to covariance vectors, covariance matrixes and N preset values and based on the marginalization characteristics of the joint Gaussian distribution;

and the distribution sampling module is used for sampling the function value distribution and determining a sampling result as a reference function value.

Optionally, the text generating device further includes:

the temporary determining module is used for determining the minimum value of the N function values corresponding to the preset values as a temporary function value and determining the preset value corresponding to the temporary function value as temporary iteration times;

the first normalization module is used for calculating the mean value and standard deviation of the N function values corresponding to the preset values, calculating a first difference value between the reference function value and the mean value, and determining the ratio of the first difference value to the standard deviation as a first normalization value;

the second normalization module is used for calculating a second difference value of the temporary function value and the mean value, determining the ratio of the second difference value to the standard deviation as a second normalization value, and calculating a subtraction result of the second normalization value and the first normalization value;

The condition detection module is used for determining that the reference iteration number meets the preset condition if the subtraction result is larger than a preset subtraction threshold value.

Optionally, the preset value correction unit includes:

the correction amount determining subunit is used for determining a preset value corresponding to the minimum value in the function values of the N corresponding preset values as a value to be corrected, calculating the ratio of the preset coefficient to the N, and taking the ratio as a correction amount;

the number counting subunit is used for counting the number of preset values larger than the value to be corrected in the N preset values to obtain a first counted value, and counting the number of preset values smaller than the value to be corrected to obtain a second counted value;

the symbol sampling subunit is used for normalizing the first statistic value and the second statistic value to obtain a first probability and a second probability, carrying out probability sampling on the positive sign and the negative sign according to the first probability and the second probability, and updating the correction quantity according to the sampling result to obtain a correction value;

and the correction subunit is used for adding the value to be corrected and the correction value, and taking the addition result as the reference iteration times.

Optionally, the text generating device further includes:

the loss calculation module is used for calculating a reference loss value according to the reference sample, the tag and the loss function when the reference iteration number does not meet the preset condition;

The preset value updating module is used for taking the reference iteration times and the reference loss value as a preset value and the corresponding function value thereof, updating the N value, and executing the step of correcting the preset value corresponding to the minimum value in the function values of the N corresponding preset values again.

It should be noted that, because the content of information interaction, execution process and the like between the modules, units and sub-units is based on the same concept as the method embodiment of the present invention, specific functions and technical effects thereof may be referred to in the method embodiment section, and will not be described herein.

Fig. 5 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention. As shown in fig. 5, the computer device of this embodiment includes: at least one processor (only one shown in fig. 5), a memory, and a computer program stored in the memory and executable on the at least one processor, the processor executing the computer program to perform the steps of any of the various text generation method embodiments described above.

The computer device may include, but is not limited to, a processor, a memory. It will be appreciated by those skilled in the art that fig. 5 is merely an example of a computer device and is not intended to limit the computer device, and that a computer device may include more or fewer components than shown, or may combine certain components, or different components, such as may also include a network interface, a display screen, an input device, and the like.

The processor may be a CPU, but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory includes a readable storage medium, an internal memory, etc., where the internal memory may be the memory of the computer device, the internal memory providing an environment for the execution of an operating system and computer-readable instructions in the readable storage medium. The readable storage medium may be a hard disk of a computer device, and in other embodiments may be an external storage device of the computer device, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), etc. that are provided on the computer device. Further, the memory may also include both internal storage units and external storage devices of the computer device. The memory is used to store an operating system, application programs, boot loader (BootLoader), data, and other programs such as program codes of computer programs, and the like. The memory may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present invention. The specific working process of the units and modules in the above device may refer to the corresponding process in the foregoing method embodiment, which is not described herein again. The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present invention may implement all or part of the flow of the method of the above-described embodiment, and may be implemented by a computer program to instruct related hardware, and the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of the method embodiment described above. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, executable files or in some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code, a recording medium, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, and a software distribution medium. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

The present invention may also be implemented as a computer program product for implementing all or part of the steps of the method embodiments described above, when the computer program product is run on a computer device, causing the computer device to execute the steps of the method embodiments described above.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided by the present invention, it should be understood that the disclosed apparatus/computer device and method may be implemented in other manners. For example, the apparatus/computer device embodiments described above are merely illustrative, e.g., the division of modules or units is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims

1. A text generation method based on artificial intelligence, characterized in that the text generation method comprises:

2. The text generation method according to claim 1, wherein the reference information contains M zero elements;

the updating the reference information using the generated term includes:

and replacing the zero element of the target position with the generated term to obtain updated reference information.

3. The text generation method according to any one of claims 1 to 2, wherein the target number of iterations includes N preset values, the loss function value includes N function values corresponding to the preset values, and N is an integer greater than zero;

correspondingly, when the loss function value is greater than a preset threshold, the correcting the target iteration number includes:

when all the function values are larger than the preset threshold, correcting the preset value corresponding to the minimum value in the N function values corresponding to the preset value;

after the loss function value is calculated, the method further comprises the following steps:

carrying out Gaussian kernel function calculation on the N preset values, and determining a calculation result as a covariance matrix;

and constructing a joint Gaussian distribution according to the covariance matrix and a preset mean value vector, wherein the joint Gaussian distribution is used for determining a reference function value corresponding to the reference iteration times.

4. A text generation method according to claim 3, further comprising, after said constructing a joint gaussian distribution:

carrying out Gaussian kernel function calculation on the reference iteration times and the N preset values respectively to obtain covariance vectors;

calculating to obtain function value distribution corresponding to the reference iteration times according to the covariance vector, the covariance matrix and the N preset values and based on the marginalization characteristic of the joint Gaussian distribution;

and sampling the function value distribution, and determining a sampling result as the reference function value.

5. The text generation method according to claim 4, wherein the process of detecting that the reference iteration number satisfies a preset condition includes:

determining the minimum value of the N function values corresponding to the preset value as a temporary function value, and determining the preset value corresponding to the temporary function value as temporary iteration times;

calculating the mean value and standard deviation of the N function values corresponding to the preset value, calculating a first difference value between the reference function value and the mean value, and determining the ratio of the first difference value to the standard deviation as a first normalization value;

calculating a second difference value of the temporary function value and the mean value, determining a ratio of the second difference value to the standard deviation as a second normalized value, and calculating a subtraction result of the second normalized value and the first normalized value;

And if the subtraction result is larger than a preset subtraction threshold value, determining that the reference iteration number meets a preset condition.

6. The text generation method according to claim 3, wherein when all the function values are greater than the preset threshold value, correcting the preset value corresponding to the minimum value of the N function values corresponding to the preset value includes:

determining a preset value corresponding to the minimum value in the N function values corresponding to the preset value as a value to be corrected, calculating the ratio of a preset coefficient to N, and taking the ratio as a correction amount;

normalizing the first statistic value and the second statistic value to obtain a first probability and a second probability, and performing probability sampling on the positive sign and the negative sign according to the first probability and the second probability to update the correction quantity according to a sampling result to obtain a correction value;

and adding the value to be corrected and the correction value, and taking the addition result as a reference iteration number.

7. The text generation method according to claim 6, further comprising, after the obtaining the reference iteration number:

When the reference iteration number does not meet a preset condition, calculating a reference loss value according to the reference sample, the tag and the loss function;

and taking the reference iteration times and the reference loss values as a preset value and the corresponding function values thereof, updating the N values, and executing the step of correcting the preset value corresponding to the minimum value in the N function values corresponding to the preset value again.

8. An artificial intelligence based text generation apparatus, the text generation apparatus comprising:

9. A computer device, characterized in that it comprises a processor, a memory and a computer program stored in the memory and executable on the processor, which processor implements the text generation method according to any of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the text generation method according to any one of claims 1 to 7.