CN114547492A - Training method for generating model, method, device, equipment and medium for generating file - Google Patents

Training method for generating model, method, device, equipment and medium for generating file Download PDF

Info

Publication number
CN114547492A
CN114547492A CN202210152882.8A CN202210152882A CN114547492A CN 114547492 A CN114547492 A CN 114547492A CN 202210152882 A CN202210152882 A CN 202210152882A CN 114547492 A CN114547492 A CN 114547492A
Authority
CN
China
Prior art keywords
decoding unit
time step
prediction result
model
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210152882.8A
Other languages
Chinese (zh)
Inventor
念天磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202210152882.8A priority Critical patent/CN114547492A/en
Publication of CN114547492A publication Critical patent/CN114547492A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9538Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The disclosure provides a training method for generating a model, a method, a device, equipment and a medium for generating a file, and relates to the field of artificial intelligence, in particular to the field of text generation. The specific implementation scheme comprises the following steps: determining a first prediction result by combining a real target of a training sample and the coding information output by the coding unit through a first decoding unit; determining a reference input value of the second decoding unit from the real target and the first prediction result; and predicting the input reference input value through a second decoding unit to obtain a final prediction result, so that the model parameters are reversely adjusted based on the final prediction result. The method trains the generative model in a two-stage decoding mode, and can ensure the diversity of the pattern generated by the generative model.

Description

Training method for generating model, method, device, equipment and medium for generating file
Technical Field
The present disclosure relates to the field of artificial intelligence, and in particular, to the field of text generation, and more particularly, to a training method for generating a model, a method and apparatus for generating a document, an electronic device, a storage medium, and a computer program product.
Background
In internet marketing, a landing page refers to a webpage which is displayed to a potential user in a skipping mode after a potential user clicks an advertisement or searches by using a search engine, and the landing page comprises a telephone art which is attractive to netizens, so that the page selling point can be highlighted, the search efficiency of the user can be improved, the netizens can be helped to directly reach various components, and the user can be stimulated to form conversion.
Disclosure of Invention
The present disclosure provides a training method of a generative model, a pattern generation method, an apparatus, an electronic device, a storage medium, and a computer program product.
According to an aspect of the present disclosure, there is provided a training method of a generative model, the generative model including one encoding unit and two decoding units, the method including:
determining a first prediction result by combining a real target of a training sample and the coding information output by the coding unit through a first decoding unit;
determining a reference input value of the second decoding unit from the real target and the first prediction result;
and predicting the input reference input value through a second decoding unit to obtain a final prediction result, so that the model parameters are reversely adjusted based on the final prediction result.
According to an aspect of the present disclosure, there is provided a document generation method including:
transmitting the text to be processed to a coding unit for generating a model to obtain coding information; the generative model is obtained after training according to any training method of the generative model in the disclosure;
the encoded information is used as an input to a second decoding unit that generates a model, and the generated pattern is determined based on the output of the second decoding unit.
According to an aspect of the present disclosure, there is provided a training apparatus for generating a model, the generating model including an encoding unit and two decoding units, the apparatus including:
the first-stage decoding module is used for determining a first prediction result by combining a real target of a training sample and the coding information output by the coding unit through the first decoding unit;
a sampling module for determining a reference input value of the second decoding unit from the real target and the first prediction result;
and the two-stage decoding module is used for predicting the input reference input value through the second decoding unit to obtain a final prediction result, so that the model parameters are reversely adjusted based on the final prediction result.
According to an aspect of the present disclosure, there is provided a document generation apparatus including:
the encoding module is used for transmitting the text to be processed to the encoding unit of the generation model to obtain encoding information; the generative model is obtained after training according to any generative model training method in the disclosure;
and the generating module is used for taking the coding information as the input of a second decoding unit of the generating model and determining the generated file according to the output of the second decoding unit.
According to another aspect of the present disclosure, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a training method or a pattern generation method of generating a model according to any of the embodiments of the present disclosure.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform a training method or a pattern generation method of generating a model according to any of the embodiments of the present disclosure.
According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the training method or the pattern generation method of the generative model of any of the embodiments of the present disclosure.
According to the technology disclosed by the invention, the generative model is trained in a two-stage decoding mode, so that the diversity of the pattern generated by the trained generative model can be ensured.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a schematic flow chart diagram of a training method for generating a model according to an embodiment of the present disclosure;
FIG. 2 is a schematic flow chart diagram illustrating a further training method for generating a model according to an embodiment of the present disclosure;
FIG. 3 is a schematic flow chart diagram illustrating a further training method for generating a model according to an embodiment of the present disclosure;
FIG. 4 is a schematic flow chart diagram of a method for generating a document according to an embodiment of the disclosure;
FIG. 5 is a schematic structural diagram of a training apparatus for generating a model according to an embodiment of the present disclosure;
FIG. 6 is a schematic structural diagram of a document generation apparatus provided in accordance with an embodiment of the present disclosure;
FIG. 7 is a block diagram of an electronic device for implementing a training method of generative models according to an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The generative model in the embodiment of the disclosure is mainly used for generating words in landing pages which are attractive to netizens, such as generating titles of the landing pages. The generative model of the present disclosure is constructed based on a transform framework, and includes an encoding unit and two decoding units, where the two decoding units have the same structure and share parameters, and further perform model training in a two-stage decoding manner, and the specific process is as follows.
Fig. 1 is a schematic flowchart of a training method for generating a model according to an embodiment of the present disclosure, which is applicable to a case of training a generated model in a two-stage decoding manner. The method can be executed by a training device for generating the model, which is realized by software and/or hardware and is integrated on the electronic equipment.
Specifically, referring to fig. 1, the training method for generating the model is as follows:
s101, determining a first prediction result through a first decoding unit by combining a real target of a training sample and the coding information output by a coding unit.
In the embodiment of the present disclosure, when training the generative model, first, training sample data is input into the coding unit of the generative model to obtain corresponding coding information, where the coding information is mainly used as an intermediate input of the first decoding unit. The real target of the training sample refers to the labeling knowledge of the sample, and the real target of the training sample is directly used as the initial input of the first decoding unit during model training.
In an alternative embodiment, the first decoding unit obtains the first prediction result by prediction at a plurality of time steps. In specific implementation, firstly, a real target corresponding to a sub-prediction result generated by the first decoding unit at any time step is used as initial input of the first decoding unit at the next time step; for example, if the sub-prediction result generated by the generative model at the first time step is "yesterday" and the real target corresponding to the time step is "present", then "present" is used as the initial input of the first decoding means at the next time step. This allows prediction errors to occur only at the first time step and not accumulate to the following. I.e. the real target is used as the initial input of the first decoding unit, it can be ensured that errors will not accumulate. It should be noted that, the conventional generative model mainly uses a seq2seq framework, and the output of the previous hidden state is used as the input of the next hidden state during the training of the model, so that in the early training stage, if an extremely bad result occurs in the preceding state, all the following state states are affected by the extremely bad result, resulting in a completely disordered final generated result.
Further, the encoded information is input as a first decoding unit in the middle of each time step. This results in a model predictive probability distribution at each time step. Therefore, the sub-prediction result of each time step can be determined according to the probability distribution of the prediction result obtained by the first decoding unit at each time step, and the set of the sub-prediction results generated at each time step is used as the first prediction result.
And S102, determining a reference input value of the second decoding unit from the real target and the first prediction result.
Optionally, in an early stage of model training, the real target is used as a reference input value of the second decoding unit; in the later stage of the model training, the first prediction result generated in step S101 is used as the reference input value of the second decoding unit. The benefits of doing so are: in the initial stage of model training, the second decoding unit basically and completely uses the real target as a reference value, so that the initial generation capability can be better and faster realized, and as the training time is increased, the first prediction result output by the first decoding unit is used as the reference value, so that the model is prevented from prematurely correcting the output to be the real target, the problem of excessive correction is solved, and the diversity is reserved. Because generative models are different from classification models, which is a complete one-hot problem, it is better that the task goals of generative models are not perfectly one-to-one aligned, which may allow synonyms or synonyms, but rather generate multiple synonyms. If only the real target (i.e. ground channel) is used as the input of the second decoding unit, the model will be continuously corrected according to the real target, and the diversity will be killed in advance.
S103, predicting the input reference input value through a second decoding unit to obtain a final prediction result, and adjusting the model parameters reversely based on the final prediction result.
After the reference input value is determined in step S102, the reference input value is input to the second decoding unit for prediction to obtain a final prediction result, and then the model parameters are reversely adjusted based on the final prediction result, mainly by updating the network parameters of the generated model through gradient backhaul. It should be noted that the gradient back-pass is from the second decoding unit to the encoding unit without passing through the first decoding unit.
In the embodiment of the disclosure, the generation model is trained in a two-stage decoding mode, and the real target is used as the input of the first decoding unit, so that prediction errors can be prevented from accumulating; according to the training period, the real target or the first prediction result is used as the input of the second decoding unit, so that the problem of over correction can be avoided, and the diversity of the file generated by the trained generation model is further ensured.
Fig. 2 is a schematic flowchart of a further training method for generating a model according to an embodiment of the present disclosure, which is based on the foregoing embodiment, and the embodiment of the present disclosure refines the process of determining the reference input value of the second decoding unit, and referring to fig. 2, the training method for generating a model specifically includes the following steps:
s201, determining a first prediction result by combining a real target of a training sample and the coding information output by the coding unit through a first decoding unit.
S202, respectively calculating the probability of the real target and the first prediction result as reference input values according to the number of training rounds and the preset hyper-parameter.
And S203, determining a reference input value of the second decoding unit according to the probability calculation result.
In the embodiment of the present disclosure, a formula for calculating a probability that a real target is used as a reference input value of the second decoding unit is as follows: p ═ β/(β + exp (epoch/β)); wherein, beta is a preset hyper-parameter, and epoch is the number of training rounds. The probability of the first prediction being a reference input value is 1-p. As can be seen from the probability calculation formula, the probability p becomes smaller and smaller as the number of training rounds epoch increases, i.e., the probability p becomes larger and larger, and the output of the first decoding unit is used as the reference input of the second decoding unit. The benefits of doing so are: in the initial stage of model training, the second decoding unit basically and completely uses the real target as a reference value, so that the initial generation capability can be better and faster realized, and as the training time is increased, the first prediction result output by the first decoding unit is used as the reference value, so that the model is prevented from prematurely correcting the output to be the real target, the problem of excessive correction is solved, and the diversity is reserved.
S204, predicting the input reference input value through a second decoding unit to obtain a final prediction result, and reversely adjusting the model parameters based on the final prediction result.
In the embodiment of the disclosure, in the initial stage of model training, the second decoding unit basically and completely uses the real target as the reference value, so that the model can better and faster have the primary generation capability, and as the training time increases, the first prediction result output by the first decoding unit is used as the reference value, so that the model is prevented from prematurely correcting the output to be the real target, the problem of excessive correction is solved, and the diversity is retained.
Fig. 3 is a schematic flowchart of a further training method for generating a model according to an embodiment of the present disclosure, where the embodiment of the present disclosure refines, on the basis of the foregoing embodiment, a process of determining respective sub-predictors for each time step according to a probability distribution of a predictor obtained by the first decoding unit at each time step, and referring to fig. 3, the method further includes the following steps:
and S301, regarding any time step, taking the probability distribution of the prediction result obtained by the first decoding unit in the time step with the highest probability as the sub-prediction result of the time step.
And S302, selecting one of the N prediction results before probability sequencing as a sub-prediction result of any time step according to the probability distribution of the prediction results obtained by the first decoding unit at the time step.
In the embodiment of the disclosure, the output of the first decoding unit is converted into probability distribution by softmax at each time step, and the sub-prediction result at the time step with the maximum probability is used as the sub-prediction result at the time step, so that the accuracy of model prediction can be ensured; and one of the prediction results with higher probability is selected as the sub-prediction result of the time step, so that the diversity of model prediction can be ensured under the condition that the prediction accuracy is not changed much.
Further, for any time step, the first decoding unit calculates the probability distribution of the prediction result of the first decoding unit at the time step based on the preset noise parameter g and the temperature parameter t. Optionally, the first decoding unit is at that timeThe probability distribution formula of the steps is as follows:
Figure BDA0003511288420000061
wherein z isiAs input for normal softmax, t is the super-ginseng temperature, g ═ log (-log (u)), and u obeys Uniform (0, 1). It should be noted that before the probability distribution is calculated, some noise is added, so that the magnitude of the softmax probability of the direct output has a certain value. And meanwhile, the smoothing degree of softmax can be controlled by increasing the temperature t, the higher the temperature is, the smoother the generated probability distribution is, and the sharper the probability distribution is, the closer the probability distribution is to the one-hot. In practical training, the temperature can be slowly reduced to gradually approach the real discrete distribution.
Fig. 4 is a schematic flow chart of a pattern generation method according to an embodiment of the present disclosure, which is applicable to a case of generating a pattern through a trained generative model. The method can be executed by a file generation device which is realized by software and/or hardware and is integrated on the electronic equipment.
Referring to fig. 4, the document generation method is specifically as follows:
s401, transmitting the text to be processed to a coding unit for generating a model to obtain coding information; the generative model is obtained after training according to any one of the training methods of the generative model disclosed by the disclosure.
S402, the coding information is used as the input of a second decoding unit for generating the model, and the generated file is determined according to the output of the second decoding unit.
A generative model can be trained through the above embodiments, and the specific training process refers to the above embodiments and is not described herein again. On the basis, the to-be-processed text can be predicted directly by using the generation model, wherein the text content in the landing page which can be selected by the to-be-processed text is predicted for generating a title according to the text content of the landing page. In the concrete implementation, firstly, coding information is obtained from a coding unit of a text generation model to be processed, the coding information is used as the input of a second decoding unit of the generation model, and the generated file is determined according to the output of the second decoding unit. It should be noted that, in the stage of using the trained generated model for prediction, only the second decoding unit needs to be used, that is, in the process of using the model, two-stage decoding is not needed.
In the embodiment of the disclosure, the purpose of generating diversified files from the text to be processed can be realized through the trained generating model.
Fig. 5 is a schematic structural diagram of a training apparatus for generating a model according to an embodiment of the present disclosure, where the generated model includes one encoding unit and two decoding units, and this embodiment is applicable to a case where the generated model is trained in a two-stage decoding manner. As shown in fig. 5, the apparatus specifically includes:
a first-stage decoding module 501, configured to determine, by a first decoding unit, a first prediction result by combining a real target of a training sample and coding information output by a coding unit;
a sampling module 502 for determining a reference input value of the second decoding unit from the real target and the first prediction result;
the second-stage decoding module 503 is configured to predict the input reference input value through the second decoding unit to obtain a final prediction result, so that the model parameter is reversely adjusted based on the final prediction result.
On the basis of the foregoing embodiment, optionally, the one-stage decoding module includes:
the initial input determining submodule is used for taking a real target corresponding to a sub-prediction result generated by the first decoding unit at any time step as the initial input of the first decoding unit at the next time step;
an intermediate input determination submodule for inputting the encoded information as an intermediate input of the first decoding unit at each time step;
and the first-stage decoding submodule is used for determining the sub-prediction result of each time step according to the probability distribution of the prediction result obtained by the first decoding unit at each time step, and taking the set of the sub-prediction results generated at each time step as the first prediction result.
On the basis of the foregoing embodiment, optionally, the one-stage decoding sub-module is further configured to:
regarding any time step, taking the probability maximum in the probability distribution of the prediction results obtained by the first decoding unit at the time step as a sub-prediction result of the time step; or the like, or, alternatively,
and for any time step, selecting one of the N predicted results before the probability sequence as a sub-predicted result of the time step according to the probability distribution of the predicted results obtained by the first decoding unit at the time step.
On the basis of the above embodiment, optionally, the method further includes:
and the probability distribution calculating module is used for calculating the probability distribution of the prediction result of the first decoding unit at any time step based on the preset noise parameter and the temperature parameter.
On the basis of the foregoing embodiment, optionally, the sampling module includes:
the probability calculation submodule is used for respectively calculating the probability of the real target and the first prediction result as reference input values according to the number of training rounds and the preset hyper-parameter;
and the determining submodule is used for determining a reference input value of the second decoding unit according to the probability calculation result.
The training device for generating the model, provided by the embodiment of the disclosure, can execute the training method for generating the model, provided by any embodiment of the disclosure, and has the corresponding functional modules and beneficial effects of the execution method. Reference may be made to the description of any method embodiment of the disclosure for a matter not explicitly described in this embodiment.
Fig. 6 is a schematic structural diagram of a document generation apparatus according to an embodiment of the present disclosure, where a generation model includes one encoding unit and two decoding units, and this embodiment is applicable to a case where a document is generated by a trained generation model. As shown in fig. 6, the apparatus specifically includes:
the encoding module 601 is configured to transmit a text to be processed to an encoding unit that generates a model, so as to obtain encoding information; wherein the generative model is obtained after training according to any one of claims 1 to 5;
the generating module 602 is configured to use the encoded information as an input of a second decoding unit for generating a model, and determine a generated pattern according to an output of the second decoding unit.
The document generation device provided by the embodiment of the disclosure can execute the document generation method provided by any embodiment of the disclosure, and has corresponding functional modules and beneficial effects of the execution method. Reference may be made to the description of any method embodiment of the disclosure for a matter not explicitly described in this embodiment
In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 7 illustrates a schematic block diagram of an example electronic device 700 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 7, the device 700 comprises a computing unit 701, which may perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM)702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data required for the operation of the device 700 can also be stored. The computing unit 701, the ROM 702, and the RAM703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
Various components in the device 700 are connected to the I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, or the like; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
Computing unit 701 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The computing unit 701 performs the various methods and processes described above, such as a training method to generate a model. For example, in some embodiments, the training method of generating the model may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 708. In some embodiments, part or all of a computer program may be loaded onto and/or installed onto device 700 via ROM 702 and/or communications unit 709. When the computer program is loaded into the RAM703 and executed by the computing unit 701, one or more steps of the training method to generate a model described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured by any other suitable means (e.g., by means of firmware) to perform the training method of generating the model.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (15)

1. A training method of a generative model comprising an encoding unit and two decoding units, the method comprising:
determining a first prediction result by combining a real target of a training sample and the coding information output by the coding unit through a first decoding unit;
determining a reference input value of a second decoding unit from the real target and the first prediction result;
predicting the input reference input value through the second decoding unit to obtain a final prediction result, so that model parameters are reversely adjusted based on the final prediction result.
2. The method of claim 1, wherein determining, by the first decoding unit, the first prediction result by combining the real target of the training sample and the coding information output by the coding unit comprises:
taking a real target corresponding to a sub-prediction result generated by the first decoding unit at any time step as initial input of the first decoding unit at the next time step;
inputting the encoded information as an intermediate of each time step for the first decoding unit;
and determining the sub-prediction result of each time step according to the probability distribution of the prediction result obtained by the first decoding unit at each time step, and taking the set of the sub-prediction results generated at each time step as the first prediction result.
3. The method of claim 2, wherein the determining the sub-predictors for each time step based on a probability distribution of the predictors obtained by the first decoding unit at each time step comprises:
regarding any time step, taking the probability maximum in the probability distribution of the prediction results obtained by the first decoding unit at the time step as a sub-prediction result of the time step; or the like, or, alternatively,
and for any time step, selecting one of the N predicted results before probability sequencing as a sub-predicted result of the time step according to the probability distribution of the predicted results obtained by the first decoding unit at the time step.
4. The method of claim 2, further comprising:
for any time step, the first decoding unit calculates the probability distribution of the prediction result of the first decoding unit at the time step based on preset noise parameters and temperature parameters.
5. The method of claim 1, wherein said determining a reference input value for a second decoding unit from said real target and said first prediction result comprises:
respectively calculating the probability of the real target and the first prediction result as the reference input value according to the number of training rounds and a preset hyper-parameter;
and determining a reference input value of the second decoding unit according to the probability calculation result.
6. A method of generating a document, comprising:
transmitting the text to be processed to a coding unit for generating a model to obtain coding information; wherein the generative model is obtained after training according to the method of any one of claims 1 to 5;
and the coding information is used as the input of a second decoding unit of the generated model, and the generated file is determined according to the output of the second decoding unit.
7. A training apparatus for generating a model, the generating model including an encoding unit and two decoding units, the apparatus comprising:
the first-stage decoding module is used for determining a first prediction result by combining a real target of a training sample and the coding information output by the coding unit through a first decoding unit;
a sampling module for determining a reference input value of a second decoding unit from the real target and the first prediction result;
and the two-stage decoding module is used for predicting the input reference input value through the second decoding unit to obtain a final prediction result, so that the model parameters are reversely adjusted based on the final prediction result.
8. The apparatus of claim 7, wherein the one-stage decoding module comprises:
the initial input determining submodule is used for taking a real target corresponding to a sub-prediction result generated by the first decoding unit at any time step as the initial input of the first decoding unit at the next time step;
an intermediate input determination submodule for taking the encoded information as an intermediate input of the first decoding unit at each time step;
and the first-stage decoding submodule is used for determining the sub-prediction result of each time step according to the probability distribution of the prediction result obtained by the first decoding unit at each time step, and taking the set of the sub-prediction results generated at each time step as the first prediction result.
9. The apparatus of claim 8, wherein the one-stage decoding sub-module is further to:
regarding any time step, taking the probability maximum in the probability distribution of the prediction results obtained by the first decoding unit at the time step as a sub-prediction result of the time step; or the like, or, alternatively,
and for any time step, selecting one of the N predicted results before probability sequencing as a sub-predicted result of the time step according to the probability distribution of the predicted results obtained by the first decoding unit at the time step.
10. The apparatus of claim 8, further comprising:
and the probability distribution calculating module is used for calculating the probability distribution of the prediction result of the first decoding unit at any time step based on preset noise parameters and temperature parameters.
11. The apparatus of claim 7, wherein the sampling module comprises:
the probability calculation sub-module is used for respectively calculating the probabilities of the real target and the first prediction result as the reference input value according to the number of training rounds and a preset hyper-parameter;
and the determining submodule is used for determining the reference input value of the second decoding unit according to the probability calculation result.
12. A document creation apparatus comprising:
the encoding module is used for transmitting the text to be processed to the encoding unit of the generation model to obtain encoding information; wherein the generative model is obtained after training according to the method of any one of claims 1 to 5;
and the generating module is used for taking the coding information as the input of a second decoding unit of the generated model and determining the generated file according to the output of the second decoding unit.
13. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5 or 6.
14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any of claims 1-5 or 6.
15. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-5 or 6.
CN202210152882.8A 2022-02-18 2022-02-18 Training method for generating model, method, device, equipment and medium for generating file Pending CN114547492A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210152882.8A CN114547492A (en) 2022-02-18 2022-02-18 Training method for generating model, method, device, equipment and medium for generating file

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210152882.8A CN114547492A (en) 2022-02-18 2022-02-18 Training method for generating model, method, device, equipment and medium for generating file

Publications (1)

Publication Number Publication Date
CN114547492A true CN114547492A (en) 2022-05-27

Family

ID=81675963

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210152882.8A Pending CN114547492A (en) 2022-02-18 2022-02-18 Training method for generating model, method, device, equipment and medium for generating file

Country Status (1)

Country Link
CN (1) CN114547492A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115188014A (en) * 2022-06-22 2022-10-14 北京百度网讯科技有限公司 Landing page processing method, model training method and device and electronic equipment
CN115512391A (en) * 2022-09-29 2022-12-23 珠海视熙科技有限公司 Target detection model training method, device and equipment for data adaptive resampling

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115188014A (en) * 2022-06-22 2022-10-14 北京百度网讯科技有限公司 Landing page processing method, model training method and device and electronic equipment
CN115188014B (en) * 2022-06-22 2023-11-14 北京百度网讯科技有限公司 Floor page processing method, model training method, device and electronic equipment
CN115512391A (en) * 2022-09-29 2022-12-23 珠海视熙科技有限公司 Target detection model training method, device and equipment for data adaptive resampling

Similar Documents

Publication Publication Date Title
CN113239705A (en) Pre-training method and device of semantic representation model, electronic equipment and storage medium
CN112466288A (en) Voice recognition method and device, electronic equipment and storage medium
US11899699B2 (en) Keyword generating method, apparatus, device and storage medium
CN114547492A (en) Training method for generating model, method, device, equipment and medium for generating file
CN113656581A (en) Text classification and model training method, device, equipment and storage medium
CN112541124A (en) Method, apparatus, device, medium and program product for generating a multitask model
CN113239157B (en) Method, device, equipment and storage medium for training conversation model
CN114937478B (en) Method for training a model, method and apparatus for generating molecules
CN112925900A (en) Search information processing method, device, equipment and storage medium
CN115456167A (en) Lightweight model training method, image processing device and electronic equipment
CN114428907A (en) Information searching method and device, electronic equipment and storage medium
CN113468857A (en) Method and device for training style conversion model, electronic equipment and storage medium
CN113205189A (en) Prediction model training method, prediction method and prediction device
CN113656689B (en) Model generation method and network information pushing method
CN113642654B (en) Image feature fusion method and device, electronic equipment and storage medium
CN115203564A (en) Information flow recommendation method and device and computer program product
CN114037060A (en) Pre-training model generation method and device, electronic equipment and storage medium
CN114254028A (en) Event attribute extraction method and device, electronic equipment and storage medium
CN114282551A (en) Translation method, translation device, electronic equipment and storage medium
CN113361621A (en) Method and apparatus for training a model
CN113807397A (en) Training method, device, equipment and storage medium of semantic representation model
CN113361712B (en) Training method of feature determination model, semantic analysis method, semantic analysis device and electronic equipment
CN113255332B (en) Training and text error correction method and device for text error correction model
CN114492456B (en) Text generation method, model training method, device, electronic equipment and medium
CN113360770B (en) Content recommendation method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination