CN114547287B

CN114547287B - Generation type text abstract method

Info

Publication number: CN114547287B
Application number: CN202111373234.7A
Authority: CN
Inventors: 田玲; 康昭; 惠孛; 孙麟; 罗光春; 袁铭潮; 陈仙莹
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2021-11-18
Filing date: 2021-11-18
Publication date: 2023-04-07
Anticipated expiration: 2041-11-18
Also published as: CN114547287A

Abstract

A method for generating text abstract belongs to the technical field of natural language processing. The invention is improved on the basis of the CBOW model of Word2Vec, and syllable marking information is blended to enhance the feature representation capability of the text; the method adopts an Encoder-Decoder framework based on LSTM to realize news abstract generation, and focuses on solving the problem of unknown words in the generation process, thereby effectively improving the effect of news abstract generation.

Description

Generation type text abstract method

Technical Field

The invention belongs to the technical field of natural language processing, and particularly relates to a generating text summarization method.

Background

Along with the improvement of computer hardware equipment technology in the modern times, the computer performance is also rapidly improved, and the internet industry is developed vigorously. The popularity of personal computers and the rapidly growing internet industry have led to the emergence of various text messages on a variety of carriers in people's daily lives. Because of the huge amount of information in this era, people face an unavoidable and challenging information overload problem, and meanwhile, because of the huge amount of information on the network, difficulty is brought to information retrieval. Therefore, how to solve the problem of data disaster caused by information overload and effectively solve the problem that people are difficult to acquire information from texts are one of hot contents concerned in the current global field. Text summarization aims at converting a text or a collection of texts into a short summary containing key information, and the technology has emerged to address the problem of information overload.

Early automated text summarization technical studies adopted both rule-based methods and methods based on traditional machine learning, but they were unsatisfactory for generating summaries because it was difficult to learn articles as human understands them. With the development of deep learning related research, the recurrent neural network model has flexible computational steps whose output depends on previous computations, which enables it to capture contextual dependencies in languages and to model various text lengths. However, the traditional framework based on the recurrent neural network has a potential problem that in the actual model prediction process, since the Vocabulary predicted by the model is open during prediction, if words which are not in the Vocabulary Of the generated words exist in the predicted text, the model cannot process and generate the words, which is an Out-Of-Vocabulary (OOV) problem. Some uncommon words in the original text may contain important information in the summary generation process, but the word list cannot be added in the training process due to the low frequency of the word list, and the model retraining cost is very high after new words are added due to the fact that the existing model is larger and larger, so that the OOV problem cannot be well solved by the traditional method.

Disclosure of Invention

The invention aims to provide a method for generating a text abstract, aiming at the defects in the background technology.

In order to realize the purpose, the technical scheme adopted by the invention is as follows:

a method of generating a text summary, comprising the steps of:

step 1, data crawling:

the method comprises the steps that a data source website crawls original corpus of a dimension language news text, and the dimension language news text is obtained after analysis;

step 2, data preprocessing:

s21, data cleaning: carrying out data cleaning on the dimensional language news text obtained in the step 1 to obtain the cleaned dimensional language news text;

s22, data format processing: carrying out data format processing on the cleaned Uygur news text to obtain a processed Uygur news text;

s23, word segmentation: performing word segmentation on the processed dimension language news text by adopting a grammar analysis word segmentation algorithm to obtain a dimension language news text after word segmentation;

s24, syllable labeling: performing syllable labeling on the segmented dimensional language news text by adopting a dimensional language voice harmony rule processing algorithm, adopting 1 to represent vowels and 0 to represent consonants, and constructing a dimensional voice syllable vector with the same dimension as the segmented dimensional language news text to obtain syllable data of the dimensional language news text;

step 3, text feature representation:

s31, initialization: firstly, traversing the segmented wiki news text obtained in the step S23 to obtain the number V of words and the word frequency of each word in the segmented wiki news text, arranging the V words according to the sequence of the word frequencies from large to small, and constructing a vocabularyTable Vocab: { w ₁ ,w ₂ ,…,w _i ,…,w _V }，w _i Represents the ith word in the vocabulary; generating One-Hot code of V dimension according to the position of the word in the vocabulary table, and for the ith word w _i And the generated One-Hot code is marked as One _ Hot _i ；

S32, generating word vectors and performing iterative training: generating a word vector by adopting the One-Hot code generated in the step S31; for the word w _i The generation process specifically comprises the following steps:

a. defining the length of a word vector as N and the window size as c;

b. random initialization weight matrix W _V×N Calculating to obtain a hidden vector h of the middle layer _i ：

c. Randomly initializing weight matrix W' _N×V Calculating the word w _i Probability distribution y of (a):

y＝softmax(h _i ·W′ _N×V )

d. iterative training: adopting a gradient descent method, and continuously iterating and training when one _ hot _i When y is lower than a preset threshold value, stopping iteration to obtain a hidden vector h of the middle layer after training _i ', hidden vector h of middle layer after training _i ' is the word w _i Trained word vector h _i ′；

S33, syllable information fusion: comparing the dimensional phonetic byte vector obtained in step S24 with the word vector h obtained in step S32 _i ' splicing ' to obtain a word vector h ' fused with syllable information _i ；

S34, word vector adjustment based on the neural network: randomly extracting a contained word w from the segmented wiki news text _i Assuming that the sentence W is composed of m words, the word W _i The j-th position in the sentence W is marked as W ^j ，W＝{w ¹ ,w ² ,···w ^m The sentence W is corresponding to a sentence vector with syllable information blended therein, the sentence vector is

Wherein->

Word W representing the jth position of the line in sentence W ^j Corresponding word vectors fused with syllable information; then, inputting each word vector in the sentence vector H merged into the syllable information into a neural network, obtaining a hidden layer vector G = { G = ₁ ,g ₂ ,···,g _j ,···,g _m In which g is _j Is a word vector->

The hidden layer vector of (2);

s35, adjusting word vectors based on an attention mechanism:

a. for hidden layer vector G = { G ₁ ,g ₂ ,···,g _j ,···,g _m Compute attention weights:

wherein, V 'and M' are matrixes initialized randomly, V 'is a matrix of 1 row and x column, M' is a matrix of x row and 1 column, x is a preset value, and b is a value initialized randomly;

b. training V ', M ' and b by adopting a gradient descent method to obtain a trained attention weight A ' = [ a ] ₁ ′,a ₂ ′…,a _j ′…,a _m ′]；

c. Packed layer vector g with trained attention weight _j Updating to obtain updated hidden layer vector g' _j ：

/>

Step 4, news abstract generation:

s41, word vector representation: suppose that the news vector S is composed of k sentence vectors, S = { S = { S = ₁ ,···,s _p ,···,s _k Where, sentence vector s _p Is composed of m' word vectors, s _p ＝{g′ ₁ ,···,g′ _q ,···,g′ _m′ }，g′ _q Expression in sentence vector s _p A word vector with the median position of q;

s42, coding: inputting the news vector S of the step S41 into an LSTM model for coding to obtain a semantic vector T:

T＝LSTM(S)

s43, decoding: inputting the semantic vector T obtained in the step S42 into another LSTM model for decoding to generate a text abstract vector S':

S′＝LSTM(T)

the text digest vector S ' is composed of k ' sentence vectors, S ' = { S = { (S) } ₁ ′，···，s _p′ ′···，s _k′ ' }, where sentence vector s _p′ 'consists of m' word vectors;

s44, copying the unknown words:

a. calculating a sentence vector s _p Chinese word vector g' _q Probability distribution of (2):

P _vocab (g′ _q )＝softmax(V″(V″′[s _p ，s _p， ′]+b′)+b″)

wherein, [ s ] _p ，s _p′ ′]The sentence vector S obtained in step S41 is shown _p And sentence vector S obtained in step S43 _p′ ' performing vector splicing operation; v "and V '" are matrices derived from random initialization, V "has dimensions of 1 x ', the dimension of V ' is x '. Multidot.1, x ' is a preset value; b 'and b' are randomly initialized values;

b. calculate word vector g' _q Is generated with probability P _gen ：

P _gen ＝sigmoid(s _p ·M ₁ +s _p′ ′·M ₂ +A′·M ₃ +b _gen )

Wherein M is ₁ 、M ₂ And M ₃ For randomly initializing the resulting matrix, M ₁ 、M ₂ 、M ₃ The dimensions of (a) are m ', m ' and m ' respectively; b is a mixture of _gen A value initialized randomly;

c. to give a term vector g' _q The final generation probability of (c):

wherein a is _j Represents the attention weight of step S35;

d. if P _vocab (g′ _q ) Is a 0 vector, then the word vector g 'is updated from the news vector S' _q Covering the word vector with the highest attention weight in S'; if P _vocab (g′ _q ) If the word vector is a non-zero vector, updating the word vector with the highest attention weight in the generated text abstract vector S' into the word vector with the highest final generation probability;

s45, mapping: and mapping each word vector into a word for the generated text abstract vector S' updated in the step S44 to obtain a final text abstract.

The invention has the beneficial effects that:

the invention provides a method for generating a text abstract, which is improved on the basis of a CBOW model of Word2Vec, and syllable marking information is integrated to enhance the feature representation capability of the text; the method adopts an Encoder-Decoder framework based on LSTM to realize news abstract generation, and focuses on solving the problem of unknown words in the generation process, thereby effectively improving the effect of news abstract generation.

Drawings

Fig. 1 is a flowchart of a method for generating a text abstract according to the present invention.

Detailed Description

The embodiments of the present invention will be described in detail with reference to the accompanying drawings.

A method for generating a text abstract specifically comprises the following steps:

step 1, data crawling:

according to the embodiment of the invention, a news text on a Wei language news website is crawled as basic data for subsequent data preprocessing, such as a Wei language news text on a central Wei language broadcast network; the method comprises the following specific steps:

s11, data acquisition: inputting a URL address of a target data source website in a script crawler frame to obtain original corpus of a dimension language news text in a Json character string format;

s12, data analysis: performing regular expression analysis on the original corpus of the dimension language news text obtained in the step S11 to obtain a dimension language news text; the Uygur news text consists of a plurality of sentences, and each sentence consists of a plurality of words;

step 2, data preprocessing:

the step mainly involves preprocessing the wiki news text obtained in step S12 to improve the data analysis processing capability of the downstream model. The data preprocessing process comprises the following steps: data cleaning, data format processing, word segmentation and syllable marking. The method specifically comprises the following steps:

s21, data cleaning: data cleaning is carried out on the dimensional Language news text obtained in the step S12 by adopting a Structured Query Language (SQL) or Excel-based manual proofreading method, and the cleaned dimensional Language news text can be obtained by means of integrity check, spelling check correction, non-text information removal, invalid data discarding and the like;

s22, data format processing: carrying out data format processing on the cleaned dimension language news text by adopting an SQL or Excel-based manual proofreading method, specifically comprising case and case conversion, numerical format unification and the like, so as to obtain a processed dimension language news text;

s23, word segmentation: performing word segmentation on the processed dimension language news text by adopting a grammar analysis word segmentation algorithm to obtain a dimension language news text after word segmentation; the segmented Uygur news text consists of sentences, and the sentences consist of words; the word segmentation processing of the step is to perform word segmentation on the processed dimension language news text, and identifiers are added among certain characters in the dimension language news text to indicate which characters in the news text form a word, and the word is not changed into a vocabulary list after the word segmentation processing; for example [ i/like/eat/apple ] is the word segmentation processing result of the text [ i like eating apple ].

S24, syllable labeling: the vowels and consonants of the dimensional language are distinguished obviously, and the expression meanings of the dimensional language of the vowel and the dimensional language of the consonant are different to a certain extent. And (4) carrying out syllable labeling on the segmented dimensional language news text by adopting a dimensional language voice harmony rule processing algorithm, and constructing a dimensional voice syllable vector with the same dimensionality as the segmented dimensional language news text to obtain dimensional language news text syllable data.

And step 3, text feature representation:

the method mainly aims at the problem that the text features generated by the traditional text feature representation method are discrete and sparse, and is improved on the basis Of a CBOW (Continuous Bag-Of-Words Model) Model Of Word2Vec, syllable marking information is blended to enhance the text feature representation capability Of the Model, and the Bi-LSTM and attention mechanism are utilized to improve the text representation capability.

S31, initialization: and generating One-Hot codes by adopting the segmented wiki news text obtained in the step S23. The specific process is as follows: firstly, traversing the segmented dimensional language news text to obtain the number V of words and the word frequency of each word in the segmented dimensional language news text, arranging the V words according to the sequence of the word frequencies from large to small, and constructing a vocabulary table Vocab: { w ₁ ,w ₂ ,…,w _i ,…,w _V }，w _i Represents the ith word in the vocabulary; generating One-Hot code of V dimension according to the position of each word in the vocabulary, and generating w for the ith word _i Indicating that it is ranked in the ith position in the vocabulary Vocab, and the generated One-Hot code is marked as One _ Hot _i The specific generation process is as follows:

for the word w _i When it is ranked in the ith position in the vocabulary Vocab, its corresponding One-Hot code One _ Hot _i Comprises the following steps: [0, 8230;, 1,0, 8230;, 0]The dimension of the code is V, the ith bit is 1, and all the other bits are 0.

S32, generating word vectors and iteratingTraining: generating a word vector by adopting the One-Hot code generated in the step S31; for the word w _i The generation process specifically comprises the following steps:

a. defining the length of a word vector to be N and the size of a window to be c;

b. randomly initializing a weight matrix W according to a Gaussian distribution _V×N Wherein V represents the row number of the matrix, namely the dimension V of One-Hot coding, and N represents the column number of the matrix, namely the length N of the defined word vector; general word w _i C preceding words w _i-c ，w _i-c+1 …，w _i-1 And c words w _i+1 ，w _i+2 …，w _i+c One-Hot coding of (1), i.e. One _ Hot _i-c ，one_hot _i-c+1， ...，one_hot _i-1 ，one_hot _i + ₁ ，one_hot _i+2 ，...，one_hot _i+c

Respectively with W _V×N Multiplying and averaging to obtain a hidden vector h of the middle layer _i (ii) a The calculation formula is as follows:

c. randomly initializing a weight matrix W 'according to Gaussian distribution' _N×V Where N denotes the number of rows of the matrix, i.e. the defined word vector length N, and V denotes the number of columns of the matrix, i.e. the dimension V of the One-Hot code; will hide the vector h _i Right by W' _N×V And obtaining the word w through an activation function softmax _i Probability distribution y of (a):

y＝softmax(h _i ·W′ _N×V )

d. iterative training: the goal of the iterative training is to make the word w _i Is closest to the true probability distribution, i.e. closest to the word w _i One-Hot encoding of (1). The method specifically comprises the following steps: using a gradient descent method, one _ hot _i The gradient of-y counter-propagates to W _V×N And W' _N×V Constantly correcting W _V×N And W' _N×V Such that one _ hot _i -y is gradually decreasing; when one _ hot _i -y is lower than oneStopping iteration when a preset threshold value is set (the threshold value is self-defined, and a numerical value approaching 0 is generally selected during setting, such as 0.001), and obtaining the hidden vector h of the middle layer after training _i ', the hidden vector is the word w _i Trained word vector h _i ′；

S33, syllable information fusion: the dimensional speech pitch vector obtained in step S24 and the word vector h obtained in step S32 are combined _i ' splicing to get the word w _i Word vector h' blended with syllable information _i ；

S34, adjusting word vectors based on Bi-LSTM (bidirectional long-short term memory network): the word w obtained in step S33 can be made available through Bi-LSTM (bidirectional Long-short term memory network) _i Word vector h' merged with syllable information _i The method comprises more context information, and the specific process is as follows:

for the word w obtained in step S33 _i Word vector h' merged with syllable information _i Firstly, randomly extracting a word w from the segmented wiki news text _i Assuming that this sentence W is composed of m words, word W _i Ranked at the jth position in the sentence W, denoted as W ^j Then the sentence can be represented as a set of words W = { W = ¹ ,w ² ,···w ^j ,···w ^m } (word w mentioned in step S31 _i Refers to the i-th word in the vocabulary Vocab, where w is ¹ ,w ² ,···w ^m Refers to a word with a position of 1,2, \ 8230;, m) in the sentence W. The sentence vector of syllable information is merged into corresponding sentence

Wherein->

Word W representing the jth position of the line in sentence W ^j Corresponding word vectors fused with syllable information; then, each of the sentence vectors H into which the syllable information is merged into a word vector of the syllable information

Sequentially inputting into a neural network composed of Bi-LSTM units to obtain

Corresponding hidden layer vector: />

Wherein, g _j Is a word vector incorporating syllable information

G is a word vector of m words in the sentence W which is integrated with syllable information->

Corresponding hidden layer vector g ₁ ,g ₂ ,···,g _m A set of compositions;

s35, adjusting word vectors based on an attention mechanism: the influence degree of different words on other words is different, and the attention mechanism is utilized to apply to the word w obtained in step S34 ^j Hidden layer vector g of _j And adjusting to receive the influence of other words in different degrees. The method specifically comprises the following steps:

a. hidden layer vector G = { G) for m words ₁ ,g ₂ ,···,g _j ,···,g _m }, calculating the attention weight [ a ] ₁ ,a ₂ …,a _j …,a _m ]The formula is as follows:

wherein A represents the attention weight a ₁ ,a ₂ …,a _j …,a _m Vector of composition, a _j Is a numerical value, the softmax function will result in a vector of dimension m, a _j The value of the j th bit in the vector output by the softmax function is obtained; v 'and M' are two matrices randomly initialized according to a gaussian distribution,v 'is a matrix of 1 row and x columns, M' is a matrix of x rows and 1 column (where x is a predetermined value, preferably approaching vector g) _j B) is a value that is randomly initialized according to a gaussian distribution;

b. training V ', M ' and b in the formula by adopting a gradient descent method to obtain a trained attention weight A ' = [ a ] ₁ ′,a ₂ ′…,a _j ′…,a _m ′]；

c. Using a trained attention weight A' = [ a ] ₁ ′,a ₂ ′…,a _j ′…,a _m ′]Vector g of hidden layer _j Updating:

get the word w ^j Updated hidden layer vector g' _j ；

Step 4, news abstract generation:

the step mainly aims at the problem that the traditional news abstract generating method is poor in effect, the news abstract is generated by adopting an Encoder-Decoder framework based on LSTM, and the problem Of Out-Of-Vocabulary (OOV) is solved in an oriented mode in the generating process, so that the effect Of generating the news abstract is improved.

S41, word vector representation: and summarizing the segmented dimensional language news text obtained in the step S23 to generate a summary. Suppose that the news vector S is composed of k sentence vectors, i.e., S = { S = { S } ₁ ,···,s _p ,···,s _k Where, sentence vector s _p Consisting of m' word vectors, s _p ＝{g′ ₁ ,···,g′ _q ,···,g′ _m′ }, wherein g' _q Expression in sentence vector s _p A word vector with the median position of q;

s42, encoding:

inputting the news vector S of the step S41 into a unidirectional LSTM model for encoding, and generating a semantic vector T by the LSTM based on the news vector S:

T＝LSTM(S)

the semantic vector T contains all the information of the news.

S43, decoding: inputting the semantic vector T obtained in the step S42 into another different unidirectional LSTM model for decoding to generate a text abstract vector S'; the generated text digest vector S ' is composed of k ' sentence vectors, S ' = { S = { (S) } ₁ ′,···,s _p′ ′···,s _k′ ' }, where sentence vector s _p′ 'is composed of m' word vectors, s _p′ ′＝{g′ ₁ ,···g′ _q′ ,··· g′ _m″ }，g′ _q′ As a vector s of sentences _p′ 'the word vector representation with position q' in (LSTM when used for decoding, one vector can be expanded into multiple vectors):

s'＝LSTM(T)

s44, copying the unknown words: after the text abstract vector S ' is obtained in step S43, it is determined whether each word vector in S ' is a vector of an unknown word (i.e., it is determined whether a word vector corresponding to a word in the Vocab vocabulary is consistent with the word vector in S ', and if so, a word copy operation is required). The specific process is as follows:

a. calculating a sentence vector s _p Chinese word vector g' _q Probability distribution P of _vocab The formula is as follows:

P _vocab (g′ _q )＝softmax(y″(y″′[S _p ，S _p′ ′]+b′)+b″)

wherein [ s ] _p ,s _p′ ′]The sentence vector S obtained in step S41 is shown _p And sentence vector S obtained in step S43 _p′ ' performing vector splicing operation; v "and V '" are two matrices randomly initialized according to a gaussian distribution, with V "having a dimension of 1 × x ' and V '" having a dimension of x ' × 1 (x ' is a predetermined value, which is about 1000); b 'and b' are two values randomly initialized according to a Gaussian distribution; v 'and V', b 'and b' all require constant modification of their parameters by gradient descent to increase P _vocab (g′ _q ) The accuracy of (3).

b. Calculate word vector g' _q Is generated with a probability P _gen ：

P _gen ＝sigmoid(S _p ·M ₁ +S _p′ ′·M ₂ +A′·M ₃ +b _gen )

Wherein M is ₁ 、M ₂ And M ₃ Is a matrix, s, obtained by random initialization based on a Gaussian distribution _p Is the sentence vector, S, obtained in step S41 _p′ 'is the sentence vector obtained in step S43, A' is the set of trained attention weights obtained in step S35, b _gen Is a value randomly initialized according to a gaussian distribution; m is a group of ₁ 、M ₂ 、M ₃ The dimensions of (a) are m ', m ", m'm", respectively; m ₁ 、M ₂ 、M ₃ And b _gen All need to continuously modify their parameters by gradient descent method to increase P _gen The accuracy of (3).

c. Synthesize the above probability distribution P _vocab And generating a probability P _gen To obtain the word vector g' _q The final generation probability of (c):

wherein a is _j The attention weight of step S35 is shown, and m' shows the sentence vector length of step S41.

d. If P _vocab (g′ _q ) Calculated as a 0 vector, the word vectors corresponding to all words in Vocab in step S31 are illustrated as g' _q All are different, and the word vector g 'needs to be updated directly from S at this time' _q Covering the word vector with the highest attention A 'weight in S'; if P _vocab (g′ _q ) Computing as a non-zero vector, then the term vector g' _q The corresponding word is present in the vocabulary table Vocab of step S31, and is in accordance with P (g' _q ) Selecting the word vector with the highest generation probability, namely updating the word vector with the highest weight of the attention force A 'in the generated text abstract vector S' into the word vector with the highest final generation probability, thereby solving the OOV problem;

s45, mapping: for the updated generation of step S44Text abstract vector S', and vector S of each sentence in S _p′ 'the word vector g' _q′ Mapping into words to obtain the final text abstract S ^final ＝{W ₁ ^final ,···,W _i ^final ,···,W _k′ ^final In which the sentence W _i ^final Consisting of m' words, W _i ^final ＝{w ¹ ,w ² ,···w ^m′ In which w ¹ ,w ² ,···w ^m′ Is a word.

The invention thus achieves a generative text summarization method.

Claims

1. A method for generating a text abstract, comprising the steps of:

step 1, data crawling:

the method comprises the steps that a data source website crawls original linguistic data of a news text, and the news text is obtained after analysis;

step 2, data preprocessing:

s21, data cleaning: carrying out data cleaning on the news text obtained in the step 1 to obtain the cleaned news text;

s22, data format processing: carrying out data format processing on the cleaned news text to obtain a processed news text;

s23, word segmentation: performing word segmentation on the processed news text by adopting a grammatical analysis word segmentation algorithm to obtain a word segmented news text;

s24, syllable labeling: carrying out syllable annotation on the segmented news text by adopting a speech harmony law processing algorithm, adopting 1 to represent vowels and 0 to represent consonants, and constructing syllable vectors with the same dimension as the segmented news text to obtain syllable data of the news text;

and step 3, text feature representation:

s31, initialization: firstly, traversing the segmented news text obtained in the step S23 to obtain the number V of words and the word frequency of each word in the segmented news text, arranging the V words according to the sequence of the word frequencies from large to small, and constructing a vocabulary table Vocab:{w ₁ ，w ₂ ，…，w _i ，…，w _V }，w _i represents the ith word in the vocabulary; generating One-Hot code of V dimension according to the position of the word in the vocabulary table, and for the ith word w _i And the generated One-Hot code is marked as One _ Hot _i ；

a. defining the length of a word vector as N and the window size as c;

c. Weight matrix W 'is initialized randomly' _N×V Calculating the word w _i Probability distribution y of (a):

y＝softmax(h _i ·W′ _N×V )

d. iterative training: adopting a gradient descent method, continuously iterating and training when the one _ hot _i When y is lower than a preset threshold value, stopping iteration to obtain a hidden vector h of the middle layer after training _i ', hidden vector h of middle layer after training _i ' is the word w _i Trained word vector h _i ′；

S33, syllable information integration: the syllable vector obtained in step S24 and the word vector h obtained in step S32 are combined _i ' splicing ' to obtain a word vector h ' fused with syllable information _i ；

S34, word vector adjustment based on the neural network: randomly extracting a word w from the segmented news text _i Assuming that the sentence W is composed of m words, the word W _i The j-th position in the sentence W is marked as W ^j ，W＝{w ¹ ，w ² ，…w ^m The sentence W is corresponding to a sentence vector with syllable information blended therein, the sentence vector is

Wherein->

Word W representing the jth position of the line in sentence W ^j Corresponding word vectors fused with syllable information; then, each word vector in the sentence vector H blended with syllable information is input into a neural network, obtaining a hidden layer vector G = { G = ₁ ，g ₂ ，…，g _j ，…，g _m In which g is _j Is a word vector->

The hidden layer vector of (2);

s35, adjusting word vectors based on an attention mechanism:

a. for hidden layer vector G = { G ₁ ，g ₂ ，…，g _j ，…，g _m Compute attention weights:

b. training V ', M ' and b by adopting a gradient descent method to obtain a trained attention weight A ' = [ a ] ₁ ′，a ₂ ′…，a _j ′…，a _m ′]；

Step 4, news abstract generation:

s41, word vector representation: suppose that the news vector S is composed of k sentence vectors, S = { S = { S = ₁ ，…，s _p ，…，s _k Where, sentence vector s _p Is composed of m' word vectors, s _p ＝{g′ ₁ ，…，g′ _q ，…，g′ _m′ }，g′ _q Expression in sentence vector s _p A word vector with the median position of q;

s42, encoding: inputting the news vector S of the step S41 into an LSTM model for coding to obtain a semantic vector T:

T＝LSTM(S)

S′＝LSTM(T)

the text digest vector S ' is composed of k ' sentence vectors, S ' = { S = } ₁ ′，…，s _p′ ′…，s _h′ ' }, where sentence vector s _p′ 'consists of m' word vectors;

s44, copying the unknown words:

P _vocab (g′ _q )＝softmax(V″(V″′[s _p ，s _p′ ′]+b′)+b″)

wherein [ s ] _p ，s _p′ ′]The sentence vector S obtained in step S41 is shown _p And sentence vector S obtained in step S43 _p′ ' performing vector splicing operation; v "and V '" are matrices derived from random initialization, V "has dimensions of 1 x ', the dimension of V ' is x '. Multidot.1, x ' is a preset value; b 'and b' are randomly initialized values;

b. calculate word vector g' _q Is generated with probability P _gen ：

P _gen ＝sigmoid(s _p ·M ₁ +s _p′ ′·M ₂ +A′·M ₃ +b _gen )

Wherein M is ₁ 、M ₂ And M ₃ For randomly initializing the resulting matrix, M ₁ 、M ₂ 、M ₃ The dimensions of (a) are m ', m ", m'm", respectively; b _gen A value initialized randomly;

c. to obtain a word vector g' _q The final generation probability of (c):

wherein a is _j Represents the attention weight of step S35;