CN109885667A

CN109885667A - Document creation method, device, computer equipment and medium

Info

Publication number: CN109885667A
Application number: CN201910067379.0A
Authority: CN
Inventors: 毕野; 黄博; 吴振宇; 王建明
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-01-24
Filing date: 2019-01-24
Publication date: 2019-06-14
Also published as: WO2020151310A1

Abstract

The invention belongs to model construction fields, disclose a kind of document creation method, device, equipment and medium, text positive sample is obtained this method comprises: concentrating from real text data, then initial generator model is established, pre-training is carried out to initial generator model with text positive sample and obtains Maker model, generates text negative sample with Maker model；Initial arbiter model is then set up, pre-training is carried out with text positive sample and text negative sample and obtains arbiter model；It allows Maker model and arbiter model constantly to be fought again and the parameter of more new model obtains text generation model according to Maker model when restraining when the convergence of arbiter model；Text to be identified is obtained, and by text input to be identified into text generation model, target text is generated based on text generation model.Document creation method of the invention can be improved the building efficiency of text generation model and generate the precision of text.

Description

Document creation method, device, computer equipment and medium

Technical field

The invention belongs to model construction fields, are to be related to a kind of document creation method, device, computer more specifically Equipment and medium.

Background technique

With the development of science and technology, high quality can be write out it is desirable that computer can be write as the mankind Natural language text, and text Auto is exactly the key technology for realizing this target.

Currently, common method be using shot and long term memory network (Long Short-Term Memory Networks, Abbreviation LSTM) Lai Jinhang text generation, LSTM is recurrent neural network (Recurrent/Recursive Neural Network, abbreviation RNN) one kind.Wherein, the training common mode of RNN is maximal possibility estimation, i.e., the t-1 list before given In the case where word, next word is provided by maximizing the log-likelihood of t-th of word.But existed using the deficiency of RNN Then gradually incremental deviation can be generated, because word sequentially generates RNN one by one, next when generating a word Word generates on the basis of word is given in front, results in producing a deviation in this way, and with the length of sequence The increase of degree, deviation also can be increasing.

In addition, RNN not can be carried out self improvement, certain applications for RNN can be added and minimize loss function to change Progressive die type.But for text generation model, since the data of input are discrete data, not directly available loss Function instructs text generation model to carry out self improvement without a kind of suitable method to obtain close to true output.

In conclusion lower to generate the efficiency of the model of text at present, it would be highly desirable to which finding a kind of text generation model can With faster, accurate generation text.

Summary of the invention

The embodiment of the present invention provides a kind of document creation method, device, computer equipment and storage medium, current to solve Generate the lower problem of the efficiency of text.

A kind of document creation method, comprising:

Real text data collection is obtained, is concentrated from the real text data and obtains text positive sample；

Initial generator model is established, the text positive sample is input to the initial generator model and is instructed in advance Practice, obtains Maker model, and the first text negative sample is generated according to the Maker model；

Initial arbiter model is established, the text positive sample and the first text negative sample are input to described initial Pre-training is carried out in discrimination model, obtains arbiter model；

Test text is generated based on the Maker model, the test text is input in the arbiter model and is obtained The reward value for taking the test text calculates the gradient of the Maker model according to the reward value, and according to the gradient Update the Maker model；

The second text negative sample is generated according to the updated Maker model, by the second text negative sample and institute It states text positive sample to be input in arbiter model, updates the arbiter model according to cross entropy is minimized；

Alternating updates the Maker model and the arbiter model, if the output of the arbiter model restrains, Maker model when according to convergence obtains text generation model；

Text to be identified is obtained, and by the text input to be identified into the text generation model, is based on the text This generation model generates target text.

A kind of text generating apparatus, comprising:

Text positive sample obtains module, for obtaining real text data collection, concentrates and obtains from the real text data Text positive sample；

Maker model obtains module, for establishing initial generator model, the text positive sample is input to described Initial generator model carries out pre-training, obtains Maker model, and generate the negative sample of the first text according to the Maker model This；

Arbiter model obtains module, for establishing initial arbiter model, by the text positive sample and described first Text negative sample is input in the initial discrimination model and carries out pre-training, obtains arbiter model；

Maker model update module, for generating test text based on the Maker model, by the test text It is input to the reward value for obtaining the test text in the arbiter model, the generator mould is calculated according to the reward value The gradient of type, and the Maker model according to the gradient updating；

Arbiter model modification module, for generating the second text negative sample according to the updated Maker model, The second text negative sample and the text positive sample are input in arbiter model, update institute according to cross entropy is minimized State arbiter model；

Text generation model obtains module, for alternately updating the Maker model and the arbiter model, if institute The output convergence for stating arbiter model, then obtain text generation model according to Maker model when restraining；

Target text generation module, for obtaining text to be identified, and by the text input to be identified to the text It generates in model, target text is generated based on the text generation model.

A kind of computer equipment, including memory, processor and storage are in the memory and can be in the processing The computer program run on device, the processor realize above-mentioned document creation method when executing the computer program.

A kind of computer readable storage medium, the computer-readable recording medium storage have computer program, the meter Calculation machine program realizes above-mentioned document creation method when being executed by processor.

Above-mentioned document creation method, device, computer equipment and storage medium, by obtaining real text data collection, from Real text data, which is concentrated, obtains text positive sample；Initial generator model is established, text positive sample is input to and is initially generated Device model carries out pre-training, obtains Maker model, and generate the first text negative sample according to Maker model；Foundation is initially sentenced Text positive sample and the first text negative sample are input in initial discrimination model and carry out pre-training, differentiated by other device model Device model；Test text is generated based on Maker model, test text is input in arbiter model and obtains test text Reward value calculates the gradient of Maker model according to reward value, and according to gradient updating Maker model；According to updated life Model of growing up to be a useful person generates the second text negative sample, the second text negative sample and text positive sample is input in arbiter model, root Arbiter model is updated according to cross entropy is minimized；Maker model and arbiter model are alternately updated, if arbiter model is defeated Restrain out, then according to convergence when Maker model obtain text generation model, obtain text to be identified, and by text to be identified It is input in text generation model, target text is generated based on text generation model.Pass through building Maker model and arbiter Model, then Maker model is allowed constantly to fight with arbiter model, continuous self improvement can be with rapid build text generation mould Type, and the accuracy for generating text is high, improves the building efficiency of text generation model and generates the precision of text.

Detailed description of the invention

It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to embodiment or description of the prior art Needed in attached drawing be briefly described, it should be apparent that, the accompanying drawings in the following description is only of the invention some Embodiment for those of ordinary skill in the art without any creative labor, can also be according to these Attached drawing obtains other attached drawings.

Fig. 1 is an application environment schematic diagram of document creation method in one embodiment of the invention；

Fig. 2 is a flow chart of document creation method in one embodiment of the invention；

Fig. 3 is another flow chart of document creation method in one embodiment of the invention；

Fig. 4 is another flow chart of document creation method in one embodiment of the invention；

Fig. 5 is another flow chart of document creation method in one embodiment of the invention；

Fig. 6 is another flow chart of document creation method in one embodiment of the invention；

Fig. 7 is a functional block diagram of text generating apparatus in one embodiment of the invention；

Fig. 8 is that Maker model obtains a functional block diagram of module in text generating apparatus in one embodiment of the invention；

Fig. 9 is that arbiter model obtains a functional block diagram of module in text generating apparatus in one embodiment of the invention；

Figure 10 is a schematic diagram of computer equipment in one embodiment of the invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, shall fall within the protection scope of the present invention.

Document creation method provided by the present application can be applicable in the application environment such as Fig. 1, wherein client passes through net Network is communicated with server-side, and server-side obtains real text data collection by client, is obtained from real text data library Text positive sample；Then initial generator model is established according to the input of client, text positive sample is input to and is initially generated Device model carries out pre-training, obtains Maker model, and generate the first text negative sample according to Maker model；Then according to visitor Initial arbiter model is established in the input at family end, and text positive sample and the first text negative sample are input to initial arbiter model Middle carry out pre-training, obtains arbiter model；Maker model is based on followed by server-side and generates test text, by test text It is input to the reward value that test text is obtained in arbiter model, according to the gradient of reward value calculating Maker model, and according to Gradient updating Maker model；Server-side generates the second text negative sample according to updated Maker model, by the second text Negative sample and text positive sample are input in arbiter model, update arbiter model according to cross entropy is minimized；Alternately update Maker model and arbiter model, if the output of arbiter model restrains, Maker model when according to convergence obtains text This generation model；Text to be identified is finally obtained, and by text input to be identified into text generation model, is based on text generation Model generates target text and returns to client.Wherein, client can be, but not limited to be various personal computers, notebook electricity Brain, smart phone, tablet computer and portable wearable device.Server-side can use the either multiple services of independent server The server cluster of device composition is realized.

In one embodiment, as shown in Fig. 2, providing a kind of document creation method, the service in Fig. 1 is applied in this way It is illustrated for end, can specifically include following steps:

S10: obtaining real text data collection, concentrates from real text data and obtains text positive sample.

Wherein, real text data collection, which refers to, wishes the corresponding urtext number of the text of text generation model final output According to collection, for example, the output of text generation model is poem if wishing, real text data collection is exactly the data set of various poem compositions. Text in the present embodiment can be poem, problem answers or dialogue etc., and the present embodiment is said using final output poem as example It is bright.

Wherein, text positive sample, which refers to from real text data, concentrates the multiple samples extracted, e.g. from real text The more first poems extracted in data set.

Specifically, the data set that can collect a large amount of poems in advance is stored in the database of server-side as real text number According to collection, when starting to train, server-side obtains real text data collection from database at random, and concentrates and take out from real text data Take some poems (sample) as text positive sample.

It in one embodiment, can as shown in figure 3, in order to train Maker model and arbiter model preferably In the form of converting vector for real text data collection, i.e. step S10 can specifically include following steps:

S11: it is concentrated from real text data and chooses N number of text data, N is positive integer.

Specifically, server-side chooses N number of sample as text positive sample from database, and wherein N is positive integer, Ke Yili Solution, the number for extracting N is more, and trained effect is better.Optionally, specifically choose which sample can lead to as text positive sample The input for crossing client obtains, such as the number of client input sample, the sample that then server-side is inputted according to client Number chooses corresponding sample from database.

S12: vector form is converted by N number of text data word vector model, translates into N number of text of vector form Data are as text positive sample.

Wherein, term vector model is word2vec model, and word2vec model includes two kinds of neural network structures, is respectively CBOW and Skip-gram.Specifically, poem (real text data collection) can be input in word2vec model and carry out by server-side Training, word2vec model just can be used to map each entry of poem to a vector after training is completed.For example, if stich It can be expressed as { entry 1, entry 2 ..., entry n }, call word2vec algorithm, entry i can be converted into x_i, which can To be expressed as a vector X^T=(x₁,x₂,…x_T)。

Specifically, server-side by N number of text data of selection by word2vec model conversation be vector form, then by this N number of text data of vector form is converted into as text positive sample a bit.

In the corresponding embodiment of Fig. 3, N number of text data is chosen by concentrating from real text data, then by N number of text Data word vector model is converted into vector form, finally translates into N number of text data of vector form as the positive sample of text This., can be more preferable with the entry correlation in text by converting vector form for text data, conveniently it is subsequently generated device model With the training of arbiter model.

S20: establishing initial generator model, and text positive sample is input to initial generator model and carries out pre-training, is obtained The first text negative sample is generated to Maker model, and according to Maker model.

It should be understood that initial generator model and subsequent initial arbiter model here is all based on neural network building Model.Optionally, since input data text is discrete data, the foundation of initial generator model, which can use, is passed Return neural network (RNN)；And in order to accelerate the training of neural network, reduce operand, the foundation of initial arbiter model can be with With convolutional neural networks (CNN).Optionally, the foundation of initial generator model and initial arbiter model can also use other Neural network is not specifically limited here.The present embodiment is using initial generator model as recurrent neural network and initial arbiter Model be convolutional neural networks for be illustrated.

Specifically, the parameter for randomly selecting RNN establishes initial generator model, after establishing initial generator model, will walk The text positive sample that rapid S10 is obtained, which is input in initial generator model, carries out pre-training, can be obtained by life after pre-training It grows up to be a useful person model, generates some negative samples as the first text negative sample, so as to initial arbiter mould further according to Maker model Type carries out pre-training.It should be understood that initial generator model and Maker model are intended merely to distinguish the nerve net before and after pre-training Network.Optionally, server-side can also concentrate other sample data of choosing to be input to initial generator model from real text data Middle carry out pre-training.

S30: establishing initial arbiter model, and text positive sample and the first text negative sample are input to initial discrimination model Middle carry out pre-training, obtains arbiter model.

Specifically, the parameter that server-side randomly selects CNN establishes initial arbiter model, establishes initial arbiter model Afterwards, the text positive sample and the first text negative sample that will acquire are labeled respectively.It illustratively, can be by text positive sample mark Note is 1, and the first text negative sample is labeled as 0.Again by after mark text positive sample and the first text negative sample be input to initially Pre-training is carried out in arbiter model, obtains arbiter model.Wherein, text positive sample and the first text negative sample are all N number of, The number of N can determine that sample is more according to the actual situation, and obtained arbiter Model checking precision is higher.It is established using CNN Arbiter model is to be operated using pondization since pond layer appropriate can be set in CNN, can prevent arbiter model pair The over-fitting of data can also accelerate the speed of arbiter model training, reduce operand.

S40: generating test text based on Maker model, and test text is input in arbiter model and obtains test text This reward value calculates the gradient of Maker model according to reward value, and according to gradient updating Maker model.

Wherein, the reward value of test text refers to the numerical value of arbiter model output.

Specifically, server-side generates test text using Maker model, and test text is then input to arbiter mould In type, the numerical value of arbiter model output is obtained as reward value.In order to continuously improve Maker model, generator mould here Type is using the intensified learning (Policy-Gradient (Policy in (Reinforcement Learning, abbreviation RL) Gradient), i.e., when output numerical value of the arbiter model to test text is relatively high, it is corresponding to increase RNN in Maker model The probability of movement；When output numerical value of the arbiter model to test text is relatively low, reduces RNN in Maker model and accordingly move The probability of work.It should be understood that the height of arbiter model output numerical value is an opposite concept, different training stage numbers here The height of value be it is different, can rule of thumb carry out it is preset, for example, just start train when Maker model output compare Arbiter model can then be exported and be set as relatively high numerical value higher than 0.3, will be less than 0.2 number for being set as relatively low by difference Value；To the training later period, then arbiter model can be exported and be higher than 0.4 as relatively high numerical value, will be less than 0.3 and be set as comparing Lower numerical value.

Specifically, the Policy-Gradient of generator is calculated according to the reward value of test text, finally with the strategy calculated Gradient updating Maker model.It is indicated with following formula:

Wherein,Refer to and seek Policy-Gradient, J (θ) refers to the objective function of Maker model, and E, which refers to, seeks desired value, G_θIt is Refer to Maker model, Y_1:t-1~G_θRefer to that the text Y that Maker model generates obeys probability distribution G_θ, G_θ(y_t|Y_1:t-1) refer to Y under Maker model_tAppear in Y_1:t-1Probability, D_φRefer to arbiter model,Refer to Maker model G_θThe text of generation In arbiter model D_φObtained reward.Expectation in above-mentioned gradient can be approximate with sampling, then updates Maker model G_θ Parameter θ:

Wherein, α_hRefer to the learning rate of hidden layer.

S50: generating the second text negative sample according to updated Maker model, just with text by the second text negative sample Sample is input in arbiter model, updates arbiter model according to cross entropy is minimized.

Specifically, server-side generates some texts as the second text negative sample using updated Maker model, then It is input in arbiter model and is trained after second text negative sample and text positive sample are marked respectively.Wherein, the second text This negative sample is labeled as 0, and text positive sample is labeled as 1, it should be appreciated that the text of text positive sample and front training here is just Sample can be identical sample, can also concentrate from real text data and extract other sample data as the positive sample of text This.

It should be understood that the purpose of the training of arbiter model be when input be real text data when, the numerical value of output is got over It is better close to 1；When input be the text that generator generates when, the numerical value of output is better closer to 0, to appoint at given one An accurate numerical value can be exported when meaning sample.Specifically, with after the available pre-training of minimum cross entropy below Arbiter parameter:

Wherein, arbiter D_φ(Y) what is returned is the probability that sample Y belongs to authentic specimen, is the number for belonging to [0,1].Y ~p_dataIndicate that Y obeys probability distribution p_data, p_dataRefer to the probability distribution that real text data collection is obeyed.Y~G_θIndicate Y clothes From probability distribution G_θ, E, which refers to, seeks desired value；Minimizing cross entropy can make above formula first part and second part as big as possible, That is the probability of truthful data is as big as possible, and the probability for generating data is as small as possible.

The parameter of arbiter model can be updated according to cross entropy is minimized, updates arbiter model.Wherein, sentence in update When other device model, be on the basis of fixed Maker model, and the number for updating arbiter model can be it is multiple, specifically It is set, is not specifically limited here according to the actual situation.

S60: Maker model and arbiter model are alternately updated, if the output of arbiter model restrains, according to convergence When Maker model obtain text generation model.

Specifically, server-side alternately updates Maker model and arbiter model, i.e., when arbiter model is not converged, weight Maker model and arbiter model are updated again, allow Maker model and the continuous dual training of arbiter model.Wherein, when update Maker model is first updated, arbiter model remains unchanged；Then it keeps Maker model constant, updates arbiter model.I.e. The parameter of arbiter model is allowed to fix, training Maker model；Then the parameter of Maker model is allowed to be fixed, training arbiter mould Type；This process is constantly repeated, until the output of arbiter model restrains.If the output of arbiter model restrains, according to receipts Maker model when holding back obtains text generation model.Wherein, output convergence refer to arbiter for given sample (positive sample or Person's negative sample) output numerical value close to 0.5, then it is assumed that arbiter can not differentiate positive negative sample, and server-side determines the defeated of arbiter It restrains out, the available final text generation model of Maker model when further according to convergence.

S70: obtaining text to be identified, and by text input to be identified into text generation model, is based on text generation mould Type generates target text.

Wherein, text to be identified is the input of text generation model, and target text is the output of text generation model.It can be with Understand, text and target text to be identified be it is corresponding with real text data collection, i.e., if training text with the data set of poem This generation model, then the corresponding text to be identified of text generation model and target text are also poem；If with the data of dialogue Collection carrys out training text and generates model, then the corresponding text to be identified of text generation model and target text are also dialogue.It is optional Ground, text and target text to be identified can also be problem answers, speech draft or short essay etc..

Specifically, server-side obtains the text to be identified of user's input by client, then by text input to be identified Into text generation model, target text is generated by text generation model, server-side again exports target text to client.Example Such as, server-side pass through client obtain user input dialogue in it is above, such as " how is the weather of today? ", then service End is generated being input in text generation model in dialogue above and corresponding target hereafter above by text generation model Text, such as: " weather of today is fine！" or " according to weather forecast, today can rain." etc., it is corresponding right so as to be formed Words, last server-side export target text to client.

In the corresponding embodiment of Fig. 2, by obtaining real text data collection, is concentrated from real text data and obtain text Positive sample；Initial generator model is established, text positive sample is input to initial generator model and carries out pre-training, is generated Device model, and the first text negative sample is generated according to Maker model；Initial arbiter model is established, by text positive sample and the One text negative sample, which is input in initial discrimination model, carries out pre-training, obtains arbiter model；It is generated based on Maker model Test text is input to the reward value for obtaining test text in arbiter model, is calculated and generated according to reward value by test text The gradient of device model, and according to gradient updating Maker model；The negative sample of the second text is generated according to updated Maker model This, the second text negative sample and text positive sample are input in arbiter model, update arbiter according to cross entropy is minimized Model；Maker model and arbiter model are alternately updated, the generation if output of arbiter model restrains, when according to convergence Device model obtains text generation model；Text to be identified is obtained, and text input to be identified is based on into text generation model Text generation model generates target text.By building Maker model and arbiter model, then allows and Maker model and differentiate Device model is constantly fought, continuous self improvement, can be with rapid build text generation model, and the accuracy for generating text is high, It improves the building efficiency of text generation model and generates the precision of text.

In one embodiment, as shown in figure 4, in step S20, that is, initial generator model is established, by text positive sample It is input to initial generator model and carries out pre-training, obtain Maker model, and the first text is generated according to Maker model and is born Sample can specifically include following steps:

S21: parameter input recurrent neural network will be initially generated and establish initial generator model.

Optionally, it is initially generated the parameter that parameter can be the recurrent neural network (RNN) randomly selected.I.e. in pre-training Before, parameter can be randomly selected to be input in RNN, obtains initial generator model.

S22: text positive sample being input in initial generator model and carries out pre-training, and is turned according to probability-distribution function Probability output is turned to, the parameter after obtaining pre-training.

Specifically, text positive sample is input in initial generator model and carries out pre-training by server-side, text positive sample E.g. (x₁,x₂,…x_T), first by (x₁,x₂,…x_T) recurrence is mapped to hidden state (h in RNN₁,h₂,…h_T), wherein Hidden state refers to the input parameter of the hidden layer (hidden layers) of recurrent neural network, while being also a neuron Output parameter, indicated with following formula:

h_t=g (h_t-1,x_t)=σ (Wx_t+Uh_t-1)

Wherein, W is weight matrix, and U is h_t-1Hidden state (or be transition matrix).σ can be sigmoid function Or hyperbolic tangent function (tanh), σ can be depending on the circumstances.

Then, it is converted into output probability with probability-distribution function, optionally, probability-distribution function can use soft max letter Number, is indicated with following formula:

P(y_t|x₁,x₂,…x_t)=z (h_t)=soft max (c+Vh_t)

Wherein, above formula refers in known (x₁,x₂,…x_T) in the case where, the output y of RNN_tBe distributed as soft max(c+Vh_t), z (h_t) refer to one h of needs_tFunction z convert the output to the form of probability, output valve belongs to [0,1], This function z can be taken as soft max function.

Specifically, text positive sample is input to by server-side carry out pre-training in the RNN of initial generator model after, can be with Parameter c and V after obtaining pre-training.

S23: the parameter of initial generator model is updated according to the parameter after pre-training, obtains Maker model.

Specifically, according to the parameter c and V that are obtained after pre-training update initial generator model it is original be initially generated ginseng Number, obtains Maker model.It is appreciated that Maker model can use G_θIt is indicated, by the available life of parameter c and V Grow up to be a useful person model G_θModel parameter θ.Obtain Maker model G_θLater, so that it may be concentrated from real text data and extract a random sample Notebook data is input to Maker model G_θIn, generate the first text negative sample.

In the corresponding embodiment of Fig. 4, initial generator is established by the way that parameter input recurrent neural network will be initially generated Then text positive sample is input in initial generator model and carries out pre-training, and converted according to probability-distribution function by model Parameter for probability output, after obtaining pre-training；The parameter of initial generator model is finally updated according to the parameter after pre-training, Obtain Maker model.Maker model is established by recurrent neural network, can be discrete data in conjunction with text generation Feature makes to ultimately generate the more efficient of text model output text；In addition, pre-training Maker model first, can pass through Maker model after pre-training generates some negative samples, realizes the pre-training to arbiter model.

In one embodiment, as shown in figure 5, in step s 30, that is, initial arbiter model is established, by text positive sample Be input to initial arbiter model with the first text negative sample and carry out pre-training, obtain arbiter model, can specifically include with Lower step:

S31: initial discriminant parameter is input to convolutional neural networks and establishes initial arbiter model.

Optionally, initial discriminant parameter can be the parameter of the convolutional neural networks (CNN) randomly selected, i.e., in pre-training Before, parameter can be randomly selected to be input in CNN, obtains initial arbiter model.

S32: text positive sample and the first text negative sample being input in initial arbiter model and carry out pre-training, according to Probability-distribution function is converted into probability output, and the initial discriminant parameter of initial arbiter is updated according to minimum cross entropy, obtains Discriminant parameter after to pre-training.

Specifically, training sample is labeled, i.e., text positive sample is labeled as 1, the first text is born this sample and is labeled as 0。

Firstly, being, for example, (x by text positive sample₁,x₂,…x_T) be input in the CNN of initial discrimination model, CNN convolution Core ω ∈ R^l×kText positive sample is acted on, the feature of text positive sample is obtained, is indicated with following formula:

Wherein, convolution kernel ω ∈ R^l×kIndicate that convolution kernel is the real matrix of a l × k, ε_i:i+l-1Refer to text positive sample In the i-th to the i-th+l-1 row and a l × k real matrix, b be require parameter, be a real number,Refer to matrix The sum of middle corresponding element product.

Then pond is carried out with maximum pond (Max pooling):

Wherein, above-mentioned pond refers to the feature c of the text positive sample of extraction_iIt is maximized.Optionally, it can also adopt here With mean value pond, specifically without limitation.

By a certain number of convolution sum pondizations operation after, by full articulamentum (fully connected layers, FC), that is, output layer, probability output is converted into sigmoid function.

Similarly, the first text negative sample for being labeled as 0 is input in CNN, by identical process, is finally used Sigmoid function is converted into probability output.

Finally, the differentiation after text positive sample and the pre-training of the first text negative sample, after available pre-training Parameter, i.e. ω and b.

Optionally, it in order to make arbiter model obtain a good effect, is obtained in maximum pondAfter, it can use High speed neural network is trained arbiter model, and high speed neural network can be calculate by the following formula:

Wherein, τ refers to one group of behavior sequence for generating text, W_T, b_TAnd W_HIt is the weight of high-speed layer, H is an affine change It changes again plus a nonlinear activation function (such as line rectification function ReLU), note line rectification function is f, thenFinally probability output is converted into sigmoid function.

Wherein, W₀And b₀It is the weight and deviation of the output layer of arbiter.

S33: the parameter of initial arbiter model is updated according to the discriminant parameter after pre-training, obtains arbiter model.

Specifically, the parameter that initial arbiter model is updated according to the discriminant parameter ω and b after pre-training, obtains arbiter Model.It is appreciated that arbiter model can use D_φTo be indicated, wherein by the available arbiter model of parameter ω and b Parameter phi.After obtaining arbiter model, so that it may carry out the dual training of Maker model Yu arbiter model, alternately update Maker model and arbiter model obtain final text generation model until model convergence.

In the corresponding embodiment of Fig. 5, initial differentiation is established by the way that initial discriminant parameter is input to convolutional neural networks Device model；Then text positive sample and the first text negative sample are input in initial arbiter model and carry out pre-training, according to Probability-distribution function is converted into probability output, and the initial discriminant parameter of initial arbiter is updated according to minimum cross entropy, obtains Discriminant parameter after to pre-training；Discriminant parameter after last pre-training updates the parameter of initial arbiter model, is differentiated Device model.The initial arbiter model of negative sample and text positive sample training generated by Maker model, available differentiation Device model, after obtaining arbiter model, so that it may so that Maker model and arbiter model carry out dual training, thus most Throughout one's life at text generation model.

In one embodiment, as shown in fig. 6, in step s 40, i.e., generating test text based on Maker model, surveying Examination text input obtains the reward value of test text into arbiter model, and the gradient of Maker model is calculated according to reward value, And according to gradient updating Maker model, it can specifically include following steps:

S41: the text during generating test text is obtained as test Ziwen sheet.

It is appreciated that Maker model has many intermediate steps during generating test text, for example, if final The text of generation is " bright moon light before bed ", then Maker model can generate " bed ", " before bed ", " bright before bed " ... waits these mistakes Text in journey, the available test text during these of server-side is as test Ziwen sheet.

S42: according to each test Ziwen, this uses monte carlo search mode to generate M hypothesis text.

Wherein, monte carlo search mode (Monte Carlo method) refers to using random number (or more common puppet Random number) come the method that solves computational problem.

It should be understood that since arbiter model can only judge the true and false of a whole word, when the survey that Maker model generates During trying text, need to obtain the reward value of test Ziwen sheet, so as to the study of Maker model and the calculating of gradient.Tool Body, server-side with monte carlo search mode according to testing the sub- N number of hypothesis text of text generation, then it is N number of hypothesis text is defeated Enter and obtain reward value into arbiter model, using the mean value of these reward values as the reward value of test Ziwen sheet.Specifically, it uses Monte carlo search mode, which generates N number of hypothesis text, to use following formula subrepresentation:

Above equation is indicated in given test this Y of Ziwen_1:tIn the case where, N number of hypothesis is generated with monte carlo search mode Text.Wherein, a probability distribution is followed with monte carlo search, this probability distribution is exactly G_β, G is enabled here_β=G_θ, i.e., N number of hypothesis text can be generated using monte carlo search mode.

S43: by M hypothesis text input into arbiter model, the reward mean value of M hypothesis text is obtained as test The reward value of Ziwen sheet, and test text is input in arbiter model, obtain the reward value of test text.

Specifically, the reward value of test Ziwen sheet and test text can be calculated with following formula:

Wherein, arbiter model D_φ(Y) what is returned is probability that test sample Y belongs to authentic specimen, be one belong to [0, 1] number；The T moment refers to that whole first poem generation finishes, therefore reward value when T can be provided directly by arbiter.And t=1:T-1 The reward value at moment (i.e. t from 1 to T-1 moment) needs the mode simulated with monte carlo search to provide.In the test of t moment Ziwen sheet is Y_1:t-1,, n times then, which are carried out, with monte carlo search mode obtains N number of hypothesis text Y_1:T, with this N number of hypothesis text Reward value reward value of the average value as t moment.In this way, since each intermediate steps both define reward value, it can Maker model is trained in intensified learning (RL).

S44: the gradient of Maker model, and root are calculated according to the reward value of the reward value of test Ziwen sheet and test text According to the parameter of gradient updating Maker model, updated Maker model is obtained.

Specifically, after obtaining the reward value of reward value and test text of test Ziwen sheet, following formula meter can be used Calculate the Policy-Gradient of Maker model:

Expectation E in above-mentioned gradient can be approximate with sampling, and the parameter θ for then updating Maker model is

Then, server-side obtains updated Maker model according to the parameter of updated Maker model, then with more Maker model after new goes to update arbiter model, Maker model and arbiter model is alternately updated, until arbiter mould Type convergence, finally obtains text generation model according to Maker model when restraining.Wherein, when updating Maker model, it is It is carried out on the basis of fixed arbiter model, and the number for updating the parameter of Maker model can be according to the actual situation It is set, is not specifically limited here.

In the corresponding embodiment of Fig. 6, by obtaining the text during generating test text as test Ziwen sheet, root According to each test Ziwen, this uses monte carlo search mode to generate M hypothesis text；Then by M hypothesis text input to sentencing In other device model, reward value of the reward mean value of M hypothesis text as test Ziwen sheet is obtained, and test text is input to In arbiter model, the reward value of test text is obtained；Finally according to the reward of the reward value of test Ziwen sheet and test text Value calculates the gradient of Maker model, and according to the parameter of gradient updating Maker model, obtains updated Maker model. By using monte carlo search mode, the available correspondingly reward value of internal expression text for generating Maker model, thus Intensified learning can be used and be trained Maker model, improve the training effectiveness of Maker model.

It should be understood that the size of the serial number of each step is not meant that the order of the execution order in above-described embodiment, each process Execution sequence should be determined by its function and internal logic, the implementation process without coping with the embodiment of the present invention constitutes any limit It is fixed.

In one embodiment, a kind of text generating apparatus is provided, text is raw in text generating means and above-described embodiment It is corresponded at method.As shown in fig. 7, text generating means include that text positive sample obtains module 10, Maker model obtains Modulus block 20, arbiter model obtain module 30, Maker model update module 40, arbiter model modification module 50 and text It generates model and obtains module 60.Detailed description are as follows for each functional module:

Text positive sample obtains module 10, for obtaining real text data collection, concentrates from real text data and obtains text This positive sample；

Maker model obtains module 20 and text positive sample is input to initial life for establishing initial generator model Model of growing up to be a useful person carries out pre-training, obtains Maker model, and generate the first text negative sample according to Maker model；

Arbiter model obtains module 30 and bears text positive sample and the first text for establishing initial arbiter model Sample, which is input in initial discrimination model, carries out pre-training, obtains arbiter model；

Test text is input to by Maker model update module 40 for generating test text based on Maker model The reward value that test text is obtained in arbiter model, the gradient of Maker model is calculated according to reward value, and more according to gradient New life grows up to be a useful person model；

Arbiter model modification module 50 will for generating the second text negative sample according to updated Maker model Second text negative sample and text positive sample are input in arbiter model, update arbiter model according to cross entropy is minimized；

Text generation model obtains module 60, for alternately updating Maker model and arbiter model, if arbiter mould The output of type restrains, then obtains text generation model according to Maker model when restraining；

Target text generation module 70, for obtaining text to be identified, and by text input to be identified to text generation mould In type, target text is generated based on text generation model.

Further, text positive sample obtains module 10 and is also used to:

It is concentrated from real text data and chooses N number of text data, N is positive integer；

Vector form is converted by N number of text data word vector model, translates into N number of text data of vector form As text positive sample.

Further, as shown in figure 8, Maker model obtain module 20 include be initially generated model foundation unit 21, just Begin to generate model pre-training unit 22 and Maker model acquiring unit 23.

It is initially generated model foundation unit 21, is initially generated for parameter input recurrent neural network foundation will to be initially generated Device model；

It is initially generated model pre-training unit 22, is carried out in advance for text positive sample to be input in initial generator model Training, and probability output is converted into according to probability-distribution function, the parameter after obtaining pre-training；

Maker model acquiring unit 23, for updating the parameter of initial generator model according to the parameter after pre-training, Obtain Maker model.

Further, as shown in figure 9, it includes that initial discrimination model establishes unit 31, initial that arbiter model, which obtains module 30, Discrimination model pre-training unit 32 and arbiter model acquiring unit 33.

Initial discrimination model establishes unit 31, initially sentences for initial discriminant parameter to be input to convolutional neural networks foundation Other device model；

Initial discrimination model pre-training unit 32, initially sentences for text positive sample to be input to the first text negative sample Pre-training is carried out in other device model, probability output is converted into according to probability-distribution function, and update just according to cross entropy is minimized The initial discriminant parameter of beginning arbiter, the discriminant parameter after obtaining pre-training；

Arbiter model acquiring unit 33, for updating the ginseng of initial arbiter model according to the discriminant parameter after pre-training Number, obtains arbiter model.

Further, Maker model update module 40 is also used to:

The text during generating test text is obtained as test Ziwen sheet；

According to each test Ziwen, this uses monte carlo search mode to generate M hypothesis text；

By M hypothesis text input into arbiter model, the reward mean value of M hypothesis text is obtained as test Ziwen This reward value, and test text is input in arbiter model, obtain the reward value of test text；

The gradient of Maker model is calculated according to the reward value of the reward value of test Ziwen sheet and test text, and according to ladder Degree updates the parameter of Maker model, obtains updated Maker model.

Specific about text generating apparatus limits the restriction that may refer to above for document creation method, herein not It repeats again.Modules in above-mentioned text generating apparatus can be realized fully or partially through software, hardware and combinations thereof.On Stating each module can be embedded in the form of hardware or independently of in the processor in computer equipment, can also store in a software form In memory in computer equipment, the corresponding operation of the above modules is executed in order to which processor calls.

In one embodiment, a kind of computer equipment is provided, which can be server, internal junction Composition can be as shown in Figure 10.The computer equipment include by system bus connect processor, memory, network interface and Database.Wherein, the processor of the computer equipment is for providing calculating and control ability.The memory packet of the computer equipment Include non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system, computer program and data Library.The built-in storage provides environment for the operation of operating system and computer program in non-volatile memory medium.The calculating The database of machine equipment is for storing real text data collection, text positive sample, text negative sample and term vector model etc..The meter The network interface for calculating machine equipment is used to communicate with external terminal by network connection.When the computer program is executed by processor To realize a kind of document creation method.

In one embodiment, a kind of computer equipment is provided, including memory, processor and storage are on a memory And the computer program that can be run on a processor, processor perform the steps of when executing computer program

Real text data collection is obtained, is concentrated from real text data and obtains text positive sample；

Initial generator model is established, text positive sample is input to initial generator model and carries out pre-training, is given birth to It grows up to be a useful person model, and the first text negative sample is generated according to Maker model；

Establish initial arbiter model, by text positive sample and the first text negative sample be input in initial discrimination model into Row pre-training obtains arbiter model；

Test text is generated based on Maker model, test text is input in arbiter model and obtains test text Reward value calculates the gradient of Maker model according to reward value, and according to gradient updating Maker model；

The second text negative sample is generated according to updated Maker model, by the second text negative sample and text positive sample It is input in arbiter model, updates arbiter model according to cross entropy is minimized；

Maker model and arbiter model are alternately updated, if the output of arbiter model restrains, when according to convergence Maker model obtains text generation model；

Text to be identified is obtained, and by text input to be identified into text generation model, it is raw based on text generation model At target text.

In one embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, is calculated Machine program performs the steps of when being executed by processor

Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computer In read/write memory medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, To any reference of memory, storage, database or other media used in each embodiment provided herein, Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..

It is apparent to those skilled in the art that for convenience of description and succinctly, only with above-mentioned each function Can unit, module division progress for example, in practical application, can according to need and by above-mentioned function distribution by different Functional unit, module are completed, i.e., the internal structure of described device is divided into different functional unit or module, more than completing The all or part of function of description.

Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations；Although referring to aforementioned reality Applying example, invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned each Technical solution documented by embodiment is modified or equivalent replacement of some of the technical features；And these are modified Or replacement, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution should all It is included within protection scope of the present invention.

Claims

1. a kind of document creation method characterized by comprising

Initial generator model is established, the text positive sample is input to the initial generator model and carries out pre-training, is obtained The first text negative sample is generated to Maker model, and according to the Maker model；

Initial arbiter model is established, the text positive sample and the first text negative sample are input to the initial differentiation Pre-training is carried out in model, obtains arbiter model；

Test text is generated based on the Maker model, the test text is input in the arbiter model and obtains institute The reward value for stating test text calculates the gradient of the Maker model according to the reward value, and according to the gradient updating The Maker model；

The second text negative sample is generated according to the updated Maker model, by the second text negative sample and the text This positive sample is input in arbiter model, updates the arbiter model according to cross entropy is minimized；

Alternating updates the Maker model and the arbiter model, if the output of the arbiter model restrains, basis Maker model when convergence obtains text generation model；

Text to be identified is obtained, and by the text input to be identified into the text generation model, it is raw based on the text Target text is generated at model.

2. document creation method as described in claim 1, which is characterized in that described concentrate from the real text data obtains Text positive sample, comprising:

It is concentrated from the real text data and chooses N number of text data, N is positive integer；

Vector form is converted by N number of text data word vector model, translates into N number of text of vector form Data are as text positive sample.

3. document creation method as described in claim 1, which is characterized in that it is described to establish initial generator model, it will be described Text positive sample is input to the initial generator model and carries out pre-training, obtains Maker model, and according to the generator Model generates the first text negative sample, comprising:

Parameter input recurrent neural network will be initially generated and establish initial generator model；

The text positive sample is input in the initial generator model and carries out pre-training, and is turned according to probability-distribution function Probability output is turned to, the parameter after obtaining pre-training；

The parameter that the initial generator model is updated according to the parameter after the pre-training, obtains Maker model.

4. document creation method as described in claim 1, which is characterized in that it is described to establish initial arbiter model, it will be described Text positive sample and the first text negative sample, which are input in the initial discrimination model, carries out pre-training, obtains arbiter mould Type, comprising:

Initial discriminant parameter is input to convolutional neural networks and establishes initial arbiter model；

The text positive sample and the first text negative sample are input in the initial arbiter model and carry out pre-training, It is converted into probability output according to probability-distribution function, and updates the initial differentiation of the initial arbiter according to minimum cross entropy Parameter, the discriminant parameter after obtaining pre-training；

The parameter that the initial arbiter model is updated according to the discriminant parameter after the pre-training, obtains arbiter model.

5. document creation method as described in claim 1, which is characterized in that described to generate test based on the Maker model The test text is input to the reward value that the test text is obtained in the arbiter model, according to the prize by text Encourage the gradient that value calculates the Maker model, and the Maker model according to the gradient updating, comprising:

The text during generating the test text is obtained as test Ziwen sheet；

By the M hypothesis text inputs into the arbiter model, the M reward mean value conducts for assuming text are obtained The reward value of the test Ziwen sheet, and the test text is input in the arbiter model, obtain the test text This reward value；

The gradient of the Maker model is calculated according to the reward value of the reward value of the test Ziwen sheet and the test text, And the parameter of the Maker model according to the gradient updating, obtain updated Maker model.

6. a kind of text generating apparatus characterized by comprising

Text positive sample obtains module, for obtaining real text data collection, concentrates from the real text data and obtains text Positive sample；

Maker model obtains module, for establishing initial generator model, the text positive sample is input to described initial Maker model carries out pre-training, obtains Maker model, and generate the first text negative sample according to the Maker model；

Arbiter model obtains module, for establishing initial arbiter model, by the text positive sample and first text Negative sample is input in the initial discrimination model and carries out pre-training, obtains arbiter model；

Maker model update module inputs the test text for generating test text based on the Maker model To the reward value for obtaining the test text in the arbiter model, the Maker model is calculated according to the reward value Gradient, and the Maker model according to the gradient updating；

Arbiter model modification module, for generating the second text negative sample according to the updated Maker model, by institute It states the second text negative sample and the text positive sample is input in arbiter model, sentence described in cross entropy update according to minimizing Other device model；

Text generation model obtains module, for alternately updating the Maker model and the arbiter model, if described sentence The output of other device model restrains, then obtains text generation model according to Maker model when restraining；

Target text generation module, for obtaining text to be identified, and by the text input to be identified to the text generation In model, target text is generated based on the text generation model.

7. text generating apparatus as claimed in claim 6, which is characterized in that it includes initial that the Maker model, which obtains module, It generates model foundation unit, be initially generated model pre-training unit and Maker model acquiring unit；

It is described to be initially generated model foundation unit, initial generator is established for parameter input recurrent neural network will to be initially generated Model；

It is described to be initially generated model pre-training unit, for the text positive sample to be input in the initial generator model Pre-training is carried out, and probability output is converted into according to probability-distribution function, the parameter after obtaining pre-training；

The Maker model acquiring unit, for updating the initial generator model according to the parameter after the pre-training Parameter obtains Maker model.

8. text generating apparatus as claimed in claim 6, which is characterized in that it includes initial that the arbiter model, which obtains module, Discrimination model establishes unit, initial discrimination model pre-training unit and arbiter model acquiring unit；

The initial discrimination model establishes unit, establishes initial differentiation for initial discriminant parameter to be input to convolutional neural networks Device model；

The initial discrimination model pre-training unit, for the text positive sample and the first text negative sample to be input to Pre-training is carried out in the initial arbiter model, probability output is converted into according to probability-distribution function, and hand over according to minimizing Fork entropy updates the initial discriminant parameter of the initial arbiter, the discriminant parameter after obtaining pre-training；

The arbiter model acquiring unit, for updating the initial arbiter mould according to the discriminant parameter after the pre-training The parameter of type obtains arbiter model.

9. a kind of computer equipment, including memory, processor and storage are in the memory and can be in the processor The computer program of upper operation, which is characterized in that the processor realized when executing the computer program as claim 1 to Any one of 5 document creation methods.

10. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, and feature exists In realization document creation method as described in any one of claim 1 to 5 when the computer program is executed by processor.