CN113536736A

CN113536736A - Sequence generation method and device based on BERT

Info

Publication number: CN113536736A
Application number: CN202010307048.2A
Authority: CN
Inventors: 张志锐; 骆卫华; 陈博兴
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-04-17
Filing date: 2020-04-17
Publication date: 2021-10-22

Abstract

The invention discloses a sequence generation method and a sequence generation device based on BERT, relates to the technical field of natural language processing, and mainly aims to realize the processing of a sequence generation task by utilizing a BERT model. The main technical scheme of the invention is as follows: acquiring a sequence generation model constructed based on a BERT model; setting iteration parameters of the sequence generation model; inputting first sequence data into the sequence generation model; and generating second sequence data according to the first sequence data and the iteration parameters by the sequence generation model.

Description

Sequence generation method and device based on BERT

Technical Field

The invention relates to the technical field of natural language processing, in particular to a sequence generation method and device based on BERT.

Background

The BERT model is a language model constructed based on a bidirectional Transformer proposed by Google. The BERT model is a combination of a pre-training model and a downstream task model, i.e., the BERT model is still used when the downstream task is done. BerT is called Bidirectional Encoder representation from Transformers, wherein the Bidirectional meaning indicates that when a word is processed, the context semantics can be obtained by considering the information of the words in front of and behind the word. BERT pre-trains the deep bi-directional representation by jointly adjusting the context in all layers, so that the pre-trained BERT representation can be fine-tuned by one additional output layer to quickly adapt to downstream tasks.

However, according to the above structure of the BERT model, the natural language processing task that the model can currently handle is mainly focused on a text classification task, such as emotion classification; sequence tagging tasks, such as word segmentation real-time recognition, part-of-speech tagging, etc., cannot process sequence generation tasks, such as sentence simplification, machine translation, etc. The existing sequence generation model adopts a unidirectional decoding mode from left to right when processing a sequence generation task, so that the training target and the generation mode of the BERT model are obviously different from the existing sequence generation model, and the BERT model cannot be well applied to the sequence generation task.

Disclosure of Invention

In view of the above problems, the present invention provides a method and an apparatus for sequence generation based on BERT, and mainly aims to implement processing on a sequence generation task by using BERT models.

In order to achieve the purpose, the invention mainly provides the following technical scheme:

in one aspect, the present invention provides a method for generating a sequence based on BERT, which specifically includes:

acquiring a sequence generation model constructed based on a BERT model;

setting iteration parameters of the sequence generation model;

inputting first sequence data into the sequence generation model;

and generating second sequence data according to the first sequence data and the iteration parameters by the sequence generation model.

In another aspect, the present invention provides a BERT-based sequence generating apparatus, which specifically includes:

the acquisition unit is used for acquiring a sequence generation model constructed based on a BERT model;

the setting unit is used for setting the iteration parameters of the sequence generation model obtained by the obtaining unit;

an input unit configured to input first sequence data to the sequence generation model obtained by the obtaining unit;

and the generating unit is used for generating second sequence data according to the first sequence data input by the input unit and the iteration parameters obtained by the setting unit by the sequence generating model.

In another aspect, the present invention provides a BERT-based sequence generation model, where the sequence generation model is formed by cascading a first BERT model and a second BERT model, where an input of the first BERT model is an input of the sequence generation model, and an output of the first BERT model is a length of an output sequence of the sequence generation model and an output sequence vector; the input of the second BERT model is the output sequence vector output by the first BERT model and a preset mask parameter, the preset mask parameter is used for performing mask operation on the specified vector in the output sequence vector, and the output of the second BERT model is a prediction result of the specified vector subjected to the mask operation.

In another aspect, the present invention provides a processor for executing a program, wherein the program executes the BERT-based sequence generation method described above.

By means of the technical scheme, the BERT-based sequence generation method and the BERT-based sequence generation device provided by the invention have the advantages that the BERT model is applied to the sequence generation task in the natural language processing task by acquiring the sequence generation model constructed based on the BERT model, and the processing effect of the sequence generation task is improved. When the method is used for processing the sequence generation task, a sequence generation model based on the BERT model needs to be constructed, and a specific application mode of the model needs to be specified, and compared with the existing BERT model which is only applied to the processing and classifying task, the method controls the iteration times of the model by setting the iteration parameters of the model, so that the sequence data to be generated is determined step by utilizing the output of the original BERT model, and the purpose of applying the BERT model to the sequence generation task is realized.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

fig. 1 shows a flowchart of a BERT-based sequence generation method proposed by an embodiment of the present invention;

FIG. 2 is a block diagram illustrating a sequence generation model according to an embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating an iterative process in a sequence generation task process of processing a sequence generation model according to an embodiment of the present invention;

fig. 4 is a block diagram showing a BERT-based sequence generating apparatus according to an embodiment of the present invention;

fig. 5 is a block diagram showing another BERT-based sequence generating apparatus according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

Natural language processing is a model for researching language ability and language application, is realized by establishing a computer algorithm framework, is perfected through training and is evaluated, and is finally used for various practical systems. Scenarios to which the model of natural language processing applies include information retrieval, machine translation, document classification, information extraction, text mining, and the like. The training of language models thus plays an important role in the task of natural language processing. Currently, the pre-training representation method makes a significant progress in processing natural language processing tasks, and existing pre-trained representation (pre-trained representation) methods are all one-way language models. That is, each word can only be trained using word information before the word, such a constraint severely limits the representation capability of pre-translated representations. Thus, the above constraints are solved by the present BERT model, which is a MASK model that replaces words in the input sequence with random MASKs, with the goal of predicting the words that are masked using context information (relative to the traditional left-to-right language model, the BERT model can use the context around the masked words to predict at the same time). However, the use of the BERT model is limited to the processing of the classification task and cannot process the sequence generation task.

Based on the understanding of the BRET model, the BERT-based sequence generation method provided by the embodiment of the invention is improved on the basis of the BERT model, so that the improved BERT model can effectively process sequence generation tasks. The specific steps of the method are shown in fig. 1, and the method comprises the following steps:

step 101, obtaining a sequence generation model constructed based on a BERT model.

Since the sequence generation model in the embodiment of the present invention is constructed based on the BERT model, the sequence generation model is not predicted unidirectionally but is an output sequence predicted by a bidirectional context in the process of generating a sequence. That is, the sequence generation mode of the sequence generation model is not the traditional left-to-right prediction, but the output sequence is predicted by the left-to-right context, so that the prediction effect of the model is better.

In addition, since the BERT model has the natural language processing task that can adapt to different application scenarios quickly by fine-tuning the model parameters, the sequence generation model in this step can also adapt to the sequence generation task in different application scenarios by fine-tuning the model parameters, that is, when the training sample in one application scenario is insufficient, the sequence generation model in this step can also adapt to the application scenario by fine-tuning the model parameters, which is an application scenario that is applicable to a new generation and lacks the training sample. Therefore, after the sequence generation model in the step is trained, the sequence generation model can be used as a pre-trained model to be quickly applied to other sequence generation task scenes by fine-tuning the model parameters.

And 102, setting iteration parameters of the sequence generation model.

Since the existing BERT model is mainly applied to the task of classifying the input sequence, and the output of the existing BERT model is mainly the corresponding category representation, when the sequence generation model in the embodiment of the present invention processes the input sequence, the output of the existing BERT model is mainly represented by the prediction of the length of the output sequence and the vector value of each vector in the output sequence, that is, the category representation of the output in BERT is applied to the prediction of individual vectors in the sequence to be generated. Since the length of the output sequence is generally greater than the classification category represented by the category, in this embodiment, the output sequence is determined step by step through multiple iterative computations, and the number of iterations determines the number of sequence vectors that need to be determined for each iteration.

Step 103, inputting the first sequence data into the sequence generation model.

And 104, generating second sequence data according to the first sequence data and the iteration parameters by using a sequence generation model.

In the specific processing process of the sequence generation model, a processing mode aiming at word vectors in a BERT model is mainly applied, specifically, the length of an output sequence is determined according to first sequence data, then the vectors in the output sequence determined after each processing are determined according to set iteration parameters, a complete output sequence is determined through multiple times of iteration processing, and the complete output sequence is determined as second sequence data and is output.

Through the explanation of the above embodiment, the BERT-based sequence generation method provided by the present invention obtains a bidirectional language model that can be used for a sequence generation task by improving the BERT model, and word vectors in an output sequence are obtained through left and right context information in the sequence, so that the output second sequence data has better natural language representation and is more beneficial to understanding. The embodiment of the invention can be applied to scenes of various sequence generation tasks, such as sentence translation, sentence interpretation or simplification, question-answering conversation and the like.

Further, as to the sequence generation model described in fig. 1 above, the sequence generation model is created based on a BERT model in the embodiment of the present invention, specifically, the sequence generation model may be constructed by combining a plurality of BERT models, and one implementation manner of the structure of the sequence generation model is shown in fig. 2, where the sequence generation model is constructed by cascading the first BERT model and the second BERT model, where an input of the first BERT model is an input of the sequence generation model, and an output of the first BERT model is a length of an output sequence of the sequence generation model and an output sequence vector; the input of the second BERT model is the output sequence vector output by the first BERT model and a preset mask parameter, the preset mask parameter is used for performing mask operation on the specified vector in the output sequence vector, and the output of the second BERT model is a prediction result of the specified vector subjected to the mask operation. Corresponding to fig. 2, assume sequence X is a sequence generation model input sequence (first sequence data); sequence Y is the output sequence generated by the sequence generation model (second sequence data), then the input of the first BERT model is sequence X and the output is Y_lAnd a sequence vector T corresponding to the sequence Y, and the input of the second BERT model is the sequence vectors T and Y_oThe output is Y_mWherein Y is_lLength of sequence vector T, Y, output for the first BERT model_oFor inputting a preset mask parameter of the second BERT model, the number of vectors for masking the vector sequence T, Y, can be determined by the preset mask parameter_mVector values resulting from the prediction of the masked vectors.

The structural description of the sequence generation model can be usedIt is shown that the first BERT model is used for predicting the length of an output sequence according to the output sequence, and performing initial prediction on vector values of position vectors in the output sequence, and taking the prediction result as the input of the second BERT model; and the second BERT model is used for performing iterative prediction on each position vector in the output sequence, determining the vector value of each position vector, and for Y_oIt is only set during the training phase of the sequence generating model, i.e. the Y_oThe values of (a) are provided by training samples, and for this purpose, the training samples of the sequence generation model proposed in the embodiment of the present invention need to include: the method comprises the steps of inputting a sequence, presetting a mask parameter and a vector value, wherein the preset mask parameter is the number of vectors for performing mask operation on specified vectors in a first BERT model output sequence, and the vector value is a vector value corresponding to the specified vectors subjected to the mask operation. Corresponding to the illustration in FIG. 2, the composition of a training sample should be (X, Y)_o,Y_m)。

And when the sequence generation model completes the training application to the sequence generation task, the sequence vector T output by the first BERT model will be all masked, i.e., Y_oWill be according to length Y of T_lAnd determining vector values of all position vectors in the T step by step through multiple iterations.

Further, in the sequence generation model in the embodiment of the present invention, in the training process, since a preset mask parameter needs to be set, an output sequence length of the model needs to be determined, and prediction is not needed for the existing BERT model, for this reason, the embodiment of the present invention adds a preset identifier at the beginning of an input sequence of a first BERT model, so that the first BERT model can determine the length of the output sequence according to the preset identifier, where the input sequence includes adding the preset identifier before an input X of a sample, and adding the preset identifier before the first sequence data. Specifically, the preset identifier is a fixed identifier, which does not change with the input sequence of the model, and the sequence generation model can determine the length of the output sequence vector according to the input sequence vector and the environment parameter to which the model is applied by recognizing the preset identifier. For example, for the same input sequence: hello, in the context of translation, its output sequence is: hello, length 1, and in a conversational environment, its output sequence may be: good morning, its length is 3. It can be seen that determining the length of the output sequence by the preset identifier is an environmental parameter that requires training the model.

Further, in the process of processing the sequence generation task by the sequence generation model, the mask prediction algorithm in the BERT model is mainly applied to the prediction of the second sequence data, i.e., individual vectors in the input sequence are masked, and the vector values of the masked vectors are predicted. Specifically, the processing procedure of the sequence generation model includes:

first, the length of the second sequence data is determined according to the preset mark at the beginning of the first sequence data.

For this reason, it is necessary to perform recognition judgment on a vector at the beginning position in the first sequence data, process the first sequence data when the vector is determined to be a preset identifier, and determine the length of the second sequence data.

Then, the vector sequence subjected to the mask processing is regarded as second sequence data according to the length of the second sequence data.

In this embodiment, the [ MASK ] tag will be used instead of all vectors in the second sequence data, i.e., sequences consisting of the same number of [ MASK ] tags will be generated according to the length determined above.

Thirdly, determining the number n of vectors to be confirmed after each iteration according to the iteration parameters and the length.

The iteration parameter is specifically the number of iterations, the specific determination method is not limited, and may be an average number, or a percentage, for example, the length is 10, and the number of iterations is 3, then the number of vectors to be determined for each iteration is determined to be at least 3 according to the average number, that is, 3 vectors are determined for the first iteration, 3 vectors are determined for the second iteration, and 4 vectors are determined for the third iteration, and 5 vectors are determined for the first iteration, 3 vectors are determined for the second iteration, and 2 vectors are determined for the third iteration.

And finally, according to the number n of the vectors, determining the n vectors with the highest prediction probability as corresponding vectors in the second sequence data in each iteration process.

Based on the above description of the sequence generation model processing sequence generation task, the procedure of iteratively determining the second sequence data at the time of processing the sequence generation task will be described below with reference to the contents shown in fig. 3.

After receiving the first sequence data, the sequence generation model identifies a preset identifier of the beginning position of the first sequence data, namely the identifier [ CLS ] of the beginning position X in fig. 2, determines the length of the second sequence data, and if the length is 5, then the input of the second BERT model in the sequence generation model is a vector sequence with the length of 5, and each vector in the vector sequence is replaced by a [ MASK ] identifier, namely the input sequence below in fig. 3, and after the first iteration processing, a vector value at each position and a corresponding probability are obtained, namely the probability that the position vector is the value of the vector are obtained. Assuming that the iteration parameter of the sequence generation model is 3 times, then, according to calculation, after the first iteration processing, the number n of vectors to be determined is 2, and if the predicted sequence vector is: "what houses headcache! "the corresponding probability is" 0.7,0.5,0.1,0.8,0.5 ", then two vectors with the highest probability, i.e. the first and the fourth, are selected to obtain a vector sequence" what [ MASK ] header [ MASK ] ", which is used as an input sequence for the second iteration, the calculation is repeated, and after 3 iterations, all 5 vectors are determined, thereby obtaining second sequence data: "what cases a headache? Here, the description of the second two iterations is omitted, and the rest of the first iteration is processed in the same manner.

Further, as to the step 102 in the embodiment shown in fig. 1, the specific manner of setting the iteration parameter is not limited in this embodiment, and the iteration parameter may be set manually, or determined by calculation, that is, the iteration parameter is set according to the length of the predicted second sequence data, so that the number of iterations may change with the change of the length, so that for a longer sequence, it may determine the vector value of each position vector through more iterations, so that the semantics of the natural language represented by the second sequence data are more accurate and easier to understand.

Further, as an implementation of the method shown in fig. 1, an embodiment of the present invention provides a BERT-based sequence generating apparatus, which mainly aims to implement processing on a sequence generating task by using a BERT model. For convenience of reading, details in the foregoing method embodiments are not described in detail again in this apparatus embodiment, but it should be clear that the apparatus in this embodiment can correspondingly implement all the contents in the foregoing method embodiments. As shown in fig. 4, the apparatus specifically includes:

an obtaining unit 21, configured to obtain a sequence generation model constructed based on a BERT model;

a setting unit 22, configured to set an iteration parameter of the sequence generation model obtained by the obtaining unit 21;

an input unit 23 configured to input first sequence data to the sequence generation model obtained by the obtaining unit 21;

and a generating unit 24, configured to generate second sequence data according to the first sequence data input by the input unit 23 and the iteration parameter obtained by the setting unit 22 by using the sequence generation model.

Further, as shown in fig. 5, the apparatus further includes:

a creating unit 25, configured to create a sequence generation model, where the sequence generation model includes a first BERT model and a second BERT model;

a training unit 26, configured to train the sequence generation model constructed by the creating unit 25 with a preset training sample.

Further, the creating unit 25 is specifically configured to cascade a first BERT model and a second BERT model to form a sequence generating model, where an input of the first BERT model is an input of the sequence generating model, and an output of the first BERT model is a length of an output sequence of the sequence generating model and an output sequence vector; the input of the second BERT model is the output sequence vector output by the first BERT model and a preset mask parameter, the preset mask parameter is used for performing mask operation on the specified vector in the output sequence vector, and the output of the second BERT model is a prediction result of the specified vector subjected to the mask operation.

Further, the preset training samples include: the method comprises the steps of inputting a sequence, presetting a mask parameter and a vector value, wherein the preset mask parameter is the number of vectors for performing mask operation on specified vectors in a first BERT model output sequence, and the vector value is a vector value corresponding to the specified vectors subjected to the mask operation.

Further, as shown in fig. 5, the training unit 26 is further configured to add a preset identifier at the beginning of the input sequence, where the first BERT model determines the length of the output sequence according to the preset identifier.

Further, the input unit 23 is further configured to add a preset identifier at the beginning of the first sequence data, and input the first sequence data containing the preset identifier into the sequence generation model.

Further, the generating unit 24 is specifically configured to:

determining the length of second sequence data according to a preset identifier at the beginning in the first sequence data by using a sequence generation model;

taking the vector sequence subjected to the mask processing as second sequence data according to the length;

determining the number n of vectors to be confirmed after each iteration according to the iteration parameters and the length;

and according to the number n of the vectors, determining the n vectors with the highest prediction probability as corresponding vectors in the second sequence data in each iteration process.

In addition, an embodiment of the present invention further provides a processor, where the processor is configured to execute a program, where the program executes the BERT-based sequence generation method provided in any one of the above embodiments when running.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

It will be appreciated that the relevant features of the method and apparatus described above are referred to one another. In addition, "first", "second", and the like in the above embodiments are for distinguishing the embodiments, and do not represent merits of the embodiments.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose preferred embodiments of the invention.

In addition, the memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.

The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A method of BERT-based sequence generation, the method comprising:

acquiring a sequence generation model constructed based on a BERT model;

setting iteration parameters of the sequence generation model;

inputting first sequence data into the sequence generation model;

2. The method of claim 1, further comprising:

creating a sequence generation model, wherein the sequence generation model comprises a first BERT model and a second BERT model;

and training the sequence generation model by using a preset training sample.

3. The method of claim 2, wherein creating the sequence generative model comprises:

cascading a first BERT model and a second BERT model to form a sequence generation model, wherein the input of the first BERT model is the input of the sequence generation model, and the output of the first BERT model is the length of an output sequence of the sequence generation model and an output sequence vector; the input of the second BERT model is the output sequence vector output by the first BERT model and a preset mask parameter, the preset mask parameter is used for performing mask operation on the specified vector in the output sequence vector, and the output of the second BERT model is a prediction result of the specified vector subjected to the mask operation.

4. The method of claim 2, wherein the preset training samples comprise: the method comprises the steps of inputting a sequence, presetting a mask parameter and a vector value, wherein the preset mask parameter is the number of vectors for performing mask operation on specified vectors in a first BERT model output sequence, and the vector value is a vector value corresponding to the specified vectors subjected to the mask operation.

5. The method of claim 4, further comprising:

and adding a preset identifier at the beginning of the input sequence, wherein the first BERT model determines the length of the output sequence according to the preset identifier.

6. The method of claim 1, wherein inputting first sequence data into the sequence generation model comprises:

and adding a preset identifier at the beginning of the first sequence data, and inputting the first sequence data containing the preset identifier into the sequence generation model.

7. The method of claim 6, wherein generating second sequence data from the first sequence data and an iteration parameter by the sequence generation model comprises:

8. An apparatus for BERT-based sequence generation, the apparatus comprising:

9. The apparatus of claim 8, further comprising:

the device comprises a creating unit, a generating unit and a generating unit, wherein the creating unit is used for creating a sequence generating model, and the sequence generating model comprises a first BERT model and a second BERT model;

and the training unit is used for training the sequence generation model constructed by the creating unit by using a preset training sample.

10. The apparatus according to claim 9, wherein the creating unit is specifically configured to concatenate a first BERT model with a second BERT model to form a sequence generator model, where an input of the first BERT model is an input of the sequence generator model, and an output of the first BERT model is a length of an output sequence of the sequence generator model and an output sequence vector; the input of the second BERT model is the output sequence vector output by the first BERT model and a preset mask parameter, the preset mask parameter is used for performing mask operation on the specified vector in the output sequence vector, and the output of the second BERT model is a prediction result of the specified vector subjected to the mask operation.

11. The apparatus of claim 9, wherein the preset training samples comprise: the method comprises the steps of inputting a sequence, presetting a mask parameter and a vector value, wherein the preset mask parameter is the number of vectors for performing mask operation on specified vectors in a first BERT model output sequence, and the vector value is a vector value corresponding to the specified vectors subjected to the mask operation.

12. The apparatus of claim 11, wherein the training unit is further configured to add a preset flag at the beginning of the input sequence, and wherein the first BERT model determines the length of the output sequence according to the preset flag.

13. The apparatus of claim 8, wherein the input unit is further configured to add a preset identifier at the beginning of the first sequence data, and input the first sequence data containing the preset identifier into the sequence generation model.

14. The apparatus according to claim 13, wherein the generating unit is specifically configured to:

15. A BERT-based sequence generation model is formed by cascading a first BERT model and a second BERT model, wherein the input of the first BERT model is the input of the sequence generation model, and the output of the first BERT model is the length of an output sequence of the sequence generation model and an output sequence vector; the input of the second BERT model is the output sequence vector output by the first BERT model and a preset mask parameter, the preset mask parameter is used for performing mask operation on the specified vector in the output sequence vector, and the output of the second BERT model is a prediction result of the specified vector subjected to the mask operation.

16. A processor, characterized in that the processor is configured to run a program, wherein the program is configured to perform the BERT based sequence generation method of any of claims 1 to 7 when running.