CN109145107A

CN109145107A - Subject distillation method, apparatus, medium and equipment based on convolutional neural networks

Info

Publication number: CN109145107A
Application number: CN201811133725.2A
Authority: CN
Inventors: 金戈; 徐亮; 肖京
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2018-09-27
Filing date: 2018-09-27
Publication date: 2019-01-04
Anticipated expiration: 2038-09-27
Also published as: CN109145107B

Abstract

The present invention provides a kind of subject distillation method, apparatus, medium and equipment based on convolutional neural networks, wherein this method comprises: obtaining the term vector matrix of text to be extracted relevant to network public-opinion；According to term vector matrix construction initial characteristics matrix, initial characteristics matrix is extracted into the input of model as topic and is input to the region unit of the first cis-position, and determines the output of region unit；The output of input every other hidden layer in region unit of each hidden layer of region unit；Input by the output of current region block as next region unit continues the output for determining all areas block；According to the revised topic weights of the output of all areas block, the keyword of text to be extracted is extracted according to revised topic weights.The network structure that this method uses can make the transmitting of network characterization and gradient more efficient, avoid gradient disappearance problem caused by loss function information is successively transmitted, and ensure that can be avoided gradient disappearance problem while expanding network depth.

Description

Subject distillation method, apparatus, medium and equipment based on convolutional neural networks

Technical field

The present invention relates to subject distillation technical field, in particular to a kind of subject distillation side based on convolutional neural networks Method, device, medium and equipment.

Background technique

With the development of development of Mobile Internet technology, the network information is in the growth of explosion type, has been full of in network a large amount of useful Or useless text；For example, a kind of form of expression of the network public-opinion as public opinion, the public is based on internet to instantly popular Social concern deliver different view or network public opinion.Since network text information content is huge, rapidly extracting network is needed to believe The main information of breath extracts the theme or abstract of information, to facilitate user quickly to position oneself interested content.

Current subject distillation model is generally basede on based on bag of words and Recognition with Recurrent Neural Network model, and bag of words do not have There is the positional factor for considering word, and text feature is 0 rank statistics；And the computational efficiency of Recognition with Recurrent Neural Network is lower, parameter compared with Do not allow parameter easy adjusting more, and as to will lead to gradient smaller and smaller for continuous iteration, that is, gradient disappearance problem occurs.If using tradition Convolutional neural networks model (CNN) merely increases network layer, also results in the problem of gradient is disappeared with accuracy rate decline, the party Formula can not improve the effect of subject distillation.

Summary of the invention

The present invention provides a kind of subject distillation method, apparatus, medium and equipment based on convolutional neural networks, to solve It is existing to there are problems that the defect of gradient disappearance using the subject distillation model of convolutional neural networks.

A kind of subject distillation method based on convolutional neural networks provided by the invention, comprising:

Text to be extracted relevant to network public-opinion is obtained, each word in the text to be extracted is successively converted into word Vector, and determine the term vector matrix of the text to be extracted, one is distributed for indicating based on corresponding words for each term vector Inscribe the topic weights of keyword possibility size；

According to the term vector matrix construction initial characteristics matrix, using the initial characteristics matrix as the theme after training The input of model is extracted, the subject distillation model includes sequentially connected region unit and connect with all areas block output end Full articulamentum, the output of the full articulamentum are the output of the subject distillation model；

Input by the input of the subject distillation model as the region unit of the first cis-position, and determine the region unit Output；The region unit includes multiple hidden layers, and the input of each hidden layer is every other hidden in the region unit Output containing layer；

Input by the output of current region block as next region unit, continues the output for determining next region unit, Until determining the output of all areas block, and the output of all areas block is reached into full articulamentum；The full articulamentum is according to institute There is the output of region unit to generate the revised topic weights of each term vector of the text to be extracted；

The keyword of the text to be extracted is extracted according to the revised topic weights of term vector.

In one possible implementation, described using the initial characteristics matrix as the subject distillation mould after training Before the input of type, this method further include:

Initial model is constructed, the initial model includes sequentially connected region unit and connect with all areas block output end Full articulamentum, the output of full articulamentum is the output of the initial model；

It obtains term vector and presets matrix, the term vector presets the corresponding topic weights of each term vector in matrix；It is logical It crosses using the default matrix of the term vector as the input of the initial model, by corresponding revised topic weights as described in The output of initial model is trained the initial model, determines the model parameter of the initial model, will determine model ginseng Several initial models is as subject distillation model.

In one possible implementation, the term vector matrix of the determination text to be extracted includes:

Each word in the text to be sorted is successively converted into term vector, and by a sentence of the text to be extracted All term vector sequential concatenations form corresponding sentence sequence in son；

Zero padding processing in tail portion carried out to all sentence sequence, treated that sentence sequence has is identical for tail portion zero padding Sequence length；

Successively using all tail portion zero paddings treated sentence sequence as a row or column of matrix, generate described wait mention Take the term vector matrix of text.

In one possible implementation, described to distribute one for indicating that corresponding words are the theme pass for each term vector The topic weights of keyword possibility size include:

One is distributed for indicating that corresponding words are the theme the identical theme of keyword possibility size for each term vector Weight；Or

It is that term vector distributes one for indicating that corresponding words are the theme keyword possibility size according to the word frequency of term vector Topic weights, the word frequency of the topic weights of the term vector and the term vector is positive correlation.

In one possible implementation, the output of the determination region unit includes:

Preset the processing sequence of all hidden layers in the region unit；

The output of the hidden layer of the first cis-position in the processing sequence is determined according to the input of the region unit, later according to The processing sequence, according to the input of the region unit and have determined the hidden layer before output output successively determine it is every The output of a hidden layer；

According to the processing sequence, successively according to the output of every other hidden layer in the region unit to current hidden layer Output be updated；After updating preset times, by the updated output of hidden layer of cis-position last in the processing sequence Output as the region unit.

In one possible implementation, described to own according to the processing sequence, successively according in the region unit The output of other hidden layers is updated the output of current hidden layer

The output of each hidden layer is successively determined according to the processing sequence, according to more new formula；The more new formula are as follows:

Wherein,Indicate that i-th of hidden layer is in the updated output of kth time in the processing sequence, g () indicates to activate Function, * indicate convolution algorithm；W_miIndicate the weight between m-th of hidden layer and i-th of hidden layer,It indicates to imply for m-th Layer is in the updated output of kth time, W_niIndicate the weight between n-th of hidden layer and i-th of hidden layer,It indicates n-th Hidden layer is in kth -1 time updated output；And as k=1, -1 update of kth expression does not update.

In one possible implementation, described to extract the text to be extracted according to the revised topic weights of term vector This keyword includes:

Revised topic weights are greater than and preset term vector corresponding to weight as target term vector, by the target Keyword of the corresponding word of term vector as the text to be extracted；Or

Revised topic weights are ranked up, by word corresponding to the topic weights of preset quantity maximum after sequence Vector is as target term vector, using the corresponding word of the target term vector as the keyword of the text to be extracted.

Based on same inventive concept, the present invention also provides a kind of subject distillation device based on convolutional neural networks, packet It includes:

Module is obtained, it, successively will be in the text to be extracted for obtaining text to be extracted relevant to network public-opinion Each word is converted to term vector, and determines the term vector matrix of the text to be extracted, distributes one for each term vector and is used for Indicate that corresponding words are the theme the topic weights of keyword possibility size；

Determining module is inputted, is used for according to the term vector matrix construction initial characteristics matrix, by the initial characteristics square Battle array as training after subject distillation model input, the subject distillation model include sequentially connected region unit and with own The full articulamentum of region unit output end connection, the output of the full articulamentum are the output of the subject distillation model；

Export determining module, for by the subject distillation model input as the first cis-position region unit input, And determine the output of the region unit；The region unit includes multiple hidden layers, and the input of each hidden layer is from described The output of every other hidden layer in region unit；

Global treatment module, the input by the output of current region block as next region unit, continues to determine next The output of all areas block until determining the output of all areas block, and is reached full articulamentum by the output of region unit；It is described complete Articulamentum generates the revised topic weights of each term vector of the text to be extracted according to the output of all areas block；

Subject distillation module, for extracting the key of the text to be extracted according to the revised topic weights of term vector Word.

Based on same inventive concept, the present invention also provides a kind of computer storage medium, the computer storage medium Computer executable instructions are stored with, the computer executable instructions are for executing method described in above-mentioned any one.

Based on same inventive concept, the present invention also provides a kind of electronic equipment, comprising:

At least one processor；And

The memory being connect at least one described processor communication；Wherein,

The memory is stored with the instruction that can be executed by least one described processor, and described instruction is by described at least one A processor executes, so that at least one described processor is able to carry out method described in above-mentioned any one.

A kind of subject distillation method, apparatus, medium and equipment based on convolutional neural networks provided in an embodiment of the present invention, Term vector based on text to be extracted can construct two-dimensional term vector matrix, later using have sequentially connected region unit and with institute The subject distillation model for the full articulamentum for having region unit to be all connected with determines the topic weights of word, and then extracts corresponding key Word.Utilize multiple hidden layer compositing area blocks, it is possible to reduce the quantity of each hidden layer output characteristic pattern, to reduce network ginseng Several quantity；The network structure of the subject distillation model can make the transmitting of network characterization and gradient more efficient, network Just it is more easier to train；And gradient disappearance problem caused by loss function information is successively transmitted is avoided, it ensure that expansion network It can be avoided gradient disappearance problem while depth, improve subject distillation model training efficiency.With the term vector of each sentence The a line of sequence as term vector matrix, it is ensured that subsequent multistage convolutional calculation.Meanwhile the defeated of hidden layer is determined stage by stage Out, and in second stage by input of the output as current hidden layer of other hidden layers, and based on it is all be updated it is hidden The output for exporting to determine the hidden layer of last cis-position containing layer, and the output as the region unit, can utmostly guarantee The network characteristic of the output of region unit；And the characteristic pattern dimension of region unit will not superlinearity increase, can reduce parameter amount and Calculation amount.

Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification It obtains it is clear that understand through the implementation of the invention.The objectives and other advantages of the invention can be by written explanation Specifically noted structure is achieved and obtained in book, claims and attached drawing.

Below by drawings and examples, technical scheme of the present invention will be described in further detail.

Detailed description of the invention

Attached drawing is used to provide further understanding of the present invention, and constitutes part of specification, with reality of the invention It applies example to be used to explain the present invention together, not be construed as limiting the invention.In the accompanying drawings:

Fig. 1 is the subject distillation method flow diagram based on convolutional neural networks in the embodiment of the present invention；

Fig. 2 is the structural schematic diagram of convolutional neural networks in the embodiment of the present invention；

Fig. 3 is the method flow diagram that term vector matrix is generated in the embodiment of the present invention；

Fig. 4 is the flow diagram that region unit output is determined in the embodiment of the present invention；

Fig. 5 is the subject distillation structure drawing of device based on convolutional neural networks in the embodiment of the present invention；

Fig. 6 is the structure chart of the subject distillation electronic equipment based on convolutional neural networks in the embodiment of the present invention.

Specific embodiment

Hereinafter, preferred embodiments of the present invention will be described with reference to the accompanying drawings, it should be understood that preferred reality described herein Apply example only for the purpose of illustrating and explaining the present invention and is not intended to limit the present invention.

A kind of subject distillation method based on convolutional neural networks provided in an embodiment of the present invention, shown in Figure 1, packet It includes:

Step 101: obtaining text to be extracted relevant to network public-opinion, successively convert each word in text to be extracted It for term vector, and determines the term vector matrix of text to be extracted, distributes one for indicating based on corresponding words for each term vector Inscribe the topic weights of keyword possibility size.

In the embodiment of the present invention, text to be extracted is the text for needing to extract keyword or theme relevant to network public-opinion This, the text may include one or more sentences, and each sentence includes one or more words；Wherein, the corresponding word of each word Vector, and then corresponding term vector matrix can be generated.Wherein, the corresponding topic weights of each term vector, the topic weights For indicating that corresponding words are the theme keyword possibility size, topic weights are bigger, which is more likely to be keyword.Step The topic weights distributed in 101 are initial topic weights, specifically, can distribute one for each term vector for indicating phase Word is answered to be the theme the identical topic weights of keyword possibility size.For example, the topic weights of each term vector are 0.01； Alternatively, the topic weights of each term vector are determined according to the number of term vector, for example, the topic weights of each term vector are 1/N, N are the number of term vector.

Alternatively, being that term vector distributes one for indicating corresponding words according to the word frequency (TF, TermFrequency) of term vector The word frequency of the topic weights for the keyword possibility size that is the theme, the topic weights of the term vector and the term vector is positive The word frequency of pass relationship, i.e. term vector is bigger, and the topic weights primarily determined in step 101 are bigger.For example, text to be extracted is total N number of term vector, some term vector A share a, then the topic weights of term vector A can be a/N.

Step 102: according to term vector matrix construction initial characteristics matrix, using initial characteristics matrix as the theme after training Extract the input of model, subject distillation model includes sequentially connected region unit and what is connect with all areas block output end complete connect Layer is connect, the output of full articulamentum, which is the theme, extracts the output of model.

In the embodiment of the present invention, term vector matrix is equivalent to a gray level image, can benefit when constructing initial characteristics matrix It is constructed with preset multiple convolution kernels；For example, term vector matrix, which is carried out inner product operation with convolution kernel, may thereby determine that Characteristic pattern (feature map), the corresponding initial characteristics matrix of characteristic pattern, the dimension of the convolution kernel can for 5 × 5, Or 6 × 1 etc., the present embodiment does not limit this.In convolutional neural networks, multiple convolution kernels can be set, therefore can construct Multiple initial characteristics matrixes later mention all initial characteristics matrixes as the theme after the training based on convolutional neural networks The input of modulus type.

In the embodiment of the present invention, the primary structure of subject distillation model is shown in Figure 2, which includes more A sequentially connected region unit and the full articulamentum being connected with all areas block.In Fig. 2, with 3 region units (B1, B2, B3 illustrate for), include four hidden layers (h1, h2, h3, h4) in each region unit, the quantity of region unit and hidden layer can root Depending on concrete condition, the present embodiment is not limited this.Multiple regions block is sequentially connected, i.e., the output of a upper region unit can Using the input as next region unit, the output of all areas block is connected to full articulamentum, is mentioned by full articulamentum output theme The output result Output of modulus type.

Wherein, subject distillation model is trained in advance, to determine the model parameter for being appropriate for subject distillation.Specifically , the process of training subject distillation model includes: building initial model, initial model include sequentially connected region unit and with institute The full articulamentum for having region unit output end to connect, the output of full articulamentum are the output of initial model.Building initial model it Afterwards, it obtains term vector and presets matrix, term vector presets the corresponding topic weights of each term vector in matrix；By by term vector Default matrix as the input of initial model, using revised topic weights are as the output of initial model accordingly, to initial Model is trained, and determines the model parameter of initial model, and the initial model of model parameter will be determined as subject distillation model.

In the embodiment of the present invention, which is the subject distillation model before training, initial model and subject distillation mould The network structure of type is identical, and only model parameter may be different, and suitable model parameter is determined by training process.Specifically , term vector is preset into matrix and corresponding revised topic weights as training sample, i.e., it is first that term vector, which presets matrix, The output of the input of beginning model, revised topic weights as initial model adjusts subject distillation model by training Model parameter, the model parameter are specifically as follows network weight, than weight W described as follows_mi、W_niDeng.Wherein, the term vector is pre- If matrix can be to be determined based on a preset training text according to step 101.

Step 103: the input by the input of subject distillation model as the region unit of the first cis-position, and determine region unit Output；Region unit include multiple hidden layers, and the input of each hidden layer in region unit every other hidden layer it is defeated Out.

In the embodiment of the present invention, the region unit of the first cis-position is first in all areas block being sequentially connected, such as Region unit B1 in Fig. 2.Traditional convolutional neural networks be by input of the output as next hidden layer of a upper hidden layer, i.e., with Hidden layer be unit carry out convolution operation, hidden layer (or convolutional layer) each in this way output feature map quantity compared with Greatly, generally cause convolutional neural networks parameter more with the width (weight) of several hundred or thousands.And comprising more in region unit A hidden layer (being 4 hidden layers in Fig. 2), in this way, the quantity of the output feature map of each hidden layer can be set it is smaller (less than 100), so as to reduce the quantity of network parameter.

Meanwhile in general convolutional neural networks, the input of hidden layer is only related to hidden layer before, exemplary with Fig. 2 Illustrate, in general convolutional neural networks, the input of hidden layer h2 is only related to the output of h1；And in embodiments of the present invention, The output of the input of each hidden layer every other hidden layer in region unit.Such as in Fig. 2, the input of hidden layer h2 with The output of hidden layer h1, h3 and h4 are related.By the output of hidden layer every other in region unit as some hidden layer Input, the result for planting connection type ensures that gradient can be directly accessed from loss function in each hidden layer, can make The transmitting of network characterization and gradient is more efficient, and network is also just more easier to train, it can the very deep network of training.Its In, the output of region unit is the output of some hidden layer；Optionally, the output of region unit is the implicit of last cis-position in hidden layer The output of layer.

Step 104: the input by the output of current region block as next region unit continues to determine next region unit Output and the output of all areas block is reached into full articulamentum until determining the output of all areas block；Full articulamentum according to The output of all areas block generates the revised topic weights of each term vector of text to be extracted.

In the embodiment of the present invention, the input of the region unit of the first cis-position is initial characteristics matrix, the later input of region unit For the output of a upper region unit, can determine according to mode identical in step 103 after the input for determining region unit should The output of region unit.By taking Fig. 2 as an example, the input of region unit B1 is initial characteristics matrix (i.e. Input), and the input of region unit B2 is The output of region unit B1, the input of region unit B3 are the output of region unit B2.Meanwhile each region unit is final with network structure Full articulamentum (Fully Connected layer, FC layer) so that each region unit can directly access loss function letter Breath, and each hidden layer is also capable of the loss function information of direct access region block in each region unit, therefore compared to tradition Only the last one hidden layer is connected with full articulamentum, and it is layer-by-layer that network structure provided in this embodiment avoids loss function information Gradient disappearance problem caused by transmitting, ensure that can be avoided gradient disappearance problem while expanding network depth, improve master Topic extracts model training efficiency, and can train very deep neural network.

Step 105: the keyword of text to be extracted is extracted according to the revised topic weights of term vector.

In the embodiment of the present invention, Text Feature Extraction model is used to correct the topic weights of term vector, each revised master of word Topic weight is mapped in the range of 0~1 to indicate that each word is the probability of keyword.Full articulamentum specifically can be used Sigmoid function or softmax function are realized.Wherein, softmax function is a kind of common more extraction regression models.Sentence It is a two-dimensional problems that whether disconnected target word, which is keyword, and corresponding softmaxt has two dimension, and one-dimensional representation is keyword Probability, two-dimensional representation are not the probability of keyword.

After determining revised topic weights, revised topic weights can be greater than to word corresponding to default weight Vector is as target term vector, using the corresponding word of target term vector as the keyword of text to be extracted.Specifically, by default power Recast is a threshold value, if the topic weights of some word are greater than the threshold value, illustrates that the word has a possibility that sufficiently large as pass Keyword, at this time can be using the word as the keyword of text to be extracted.

Alternatively, revised topic weights are ranked up after determining revised topic weights, it will be maximum after sequence Preset quantity topic weights corresponding to term vector as target term vector, using the corresponding word of target term vector as wait mention Take the keyword of text.For example, this preset quantity is q, then there is the preceding maximum word of q topic weights a possibility that sufficiently large to make It, then can be using the q word as the keyword of text to be extracted for keyword.

A kind of subject distillation method based on convolutional neural networks provided in an embodiment of the present invention, based on text to be extracted Term vector can construct two-dimensional term vector matrix, later using having sequentially connected region unit and be all connected with all areas block The subject distillation model of full articulamentum determines the topic weights of word, and then extracts corresponding keyword.Utilize multiple hidden layers Compositing area block, it is possible to reduce the quantity of each hidden layer output characteristic pattern, to reduce the quantity of network parameter；The theme mentions The network structure of modulus type can make the transmitting of network characterization and gradient more efficient, and network is also just more easier to train；And Gradient disappearance problem caused by loss function information is successively transmitted is avoided, ensure that can be avoided while expanding network depth Gradient disappearance problem improves subject distillation model training efficiency.

The subject distillation method based on convolutional neural networks that another embodiment of the present invention provides a kind of, this method includes above-mentioned Step 101-105 in embodiment, realization principle and technical effect are referring to the corresponding embodiment of Fig. 1.Meanwhile referring to Fig. 3 Shown, in the embodiment of the present invention, step 101 " obtaining text to be extracted relevant to network public-opinion " includes step 1011-1012:

Step 1011: web page text information relevant to network public-opinion, web page text packet are obtained based on web crawlers One or several sentences are included, each sentence includes one or more words.

Step 1012: denoising duplicate removal processing being carried out to web page text information, and to the web page text after denoising duplicate removal processing Information carries out word segmentation processing, using the web page text information after word segmentation processing as text to be extracted.

In the embodiment of the present invention, specifically webpage relevant to network public-opinion, web crawlers skill can be obtained based on web crawlers Art is the technology of comparative maturity, is not repeated herein.Meanwhile after getting web page text information, to web page content information into Row denoising (removing unrelated advertisement etc.) and duplicate removal processing (remove the identical web page contents letter obtained in different url Breath), to reduce the treating capacity of subsequent redundancy.Optionally, weight coefficient, the weight can be arranged to each single item web page text information The initial value of coefficient is 1, and after removing a duplicate web page content information, the weight coefficient of the web page content information adds 1, I.e. weight coefficient is bigger, illustrates that content relevant to the web page text information is more on network, the web page text information it is important Property is higher.It is carrying out word segmentation processing again to web page content information later, stop words is gone to handle, and then can obtain and network public-opinion Relevant phrase set, and using the phrase set as text to be extracted.

Optionally, shown in Figure 3, step 101 " the term vector matrix for determining text to be extracted " includes step 1013- 1015:

Step 1013: each word in text to be extracted being successively converted into term vector, and by one of text to be extracted All term vector sequential concatenations form corresponding sentence sequence in sentence.

In the embodiment of the present invention, each word in text to be extracted can be converted to word2vec term vector, each sentence It is composed of one or more letters, i.e., each corresponding one or more term vectors of sentence are according to sequence of the term vector in sentence Can sequential concatenation formed sentence sequence, the sentence sequence be one-dimension array.

Step 1014: zero padding processing in tail portion being carried out to all sentence sequence, tail portion zero padding treated sentence sequence tool There is identical sequence length.

In the embodiment of the present invention, the full-length of all sentence sequences can be preset, all sentence sequences can also generated Wherein longest sentence sequence is determined afterwards, using the length of the longest sentence sequence as full-length；Determining full-length Afterwards, the sentence sequence inadequate to length carries out tail portion zero padding processing, i.e. last position of subordinate clause subsequence starts to carry out zero padding, directly To reaching full-length.

Step 1015: raw successively using all tail portion zero paddings treated sentence sequence as a row or column of matrix At the term vector matrix of text to be extracted.

Due to zero padding treated sentence sequence sequence length having the same, all sentence sequences have been enumerated at this time A matrix, i.e. term vector matrix can be formed.Under normal circumstances, using sentence sequence as a line of matrix, i.e. term vector The set of the term vector of the corresponding sentence of a line of matrix；At this point, m is text to be extracted for the term vector matrix of m × n In sentence quantity, n is full-length.Optionally, can also preset the size of m and n, i.e. the size of term vector matrix be it is determining, As unit of sentence successively line by line or rows of filling term vector set of matrices later, in term vector matrix not There are the elements of sentence sequence to be set as 0.In the embodiment of the present invention, using the term vector sequence of each sentence as term vector matrix A line, it is ensured that subsequent multistage convolutional calculation.

On the basis of the above embodiments, the process of step 103 " output for determining region unit " is true by two stages Determine the output of region unit, which includes step A1-A3:

Step A1: the processing sequence of all hidden layers in predeterminable area block.

In the embodiment of the present invention, for a region unit, although the input of each hidden layer is owned in region unit The output of other hidden layers, but in actual process, there are processing sequences between hidden layer, implicit to one Layer is disposed after (or temporarily finishing) and handles again next hidden layer.As shown in Fig. 2, the processing of four hidden layers is suitable Sequence can be h1 → h2 → h3 → h4.

Step A2: according to region unit input determine processing sequence in the first cis-position hidden layer output, later according to Processing sequence according to the input of region unit and has determined that the output of the hidden layer before output successively determines each hidden layer Output.

It in the embodiment of the present invention, determines that the output of region unit mainly includes two stages, in the first stage, generates each hidden Characteristic pattern (feature map) containing layer；In second stage, the characteristic pattern of each hidden layer is updated or is adjusted.Specifically , a region unit is shown in Figure 4 in the treatment process in two stages in Fig. 2, and in Fig. 4, dotted portion indicates the first stage, Bold portion indicates second stage；It should be noted that in Fig. 4 for convenience description, property illustrates 8 hidden layers stage by stage, But its network structure is essentially shown in Fig. 2.

In the first stage, the input by the input of region unit as the hidden layer of the first cis-position, and then can determine that this is hidden Output containing layer.With shown in Fig. 4, the hidden layer of the first cis-position is h1, input is the input of the region unit, i.e. X in Fig. 4₀； When the region unit is region unit (B1 in such as Fig. 2) of the first cis-position, then X₀Indicate initial characteristics matrix；When the region unit is When other region units (B2, B3 in such as Fig. 2), then X₀For the output of a upper region unit.In the input X for determining hidden layer h1₀ Afterwards, that is, it can determine the output X of hidden layer h1₁；Wherein, it in convolutional neural networks, is determined according to the input of hidden layer corresponding defeated It is routine techniques out, the present embodiment is not detailed this.

After determining the output of hidden layer of the first cis-position, the defeated of other hidden layers can be successively determined according to processing sequence Out.Wherein, for second hidden layer h2, the input of region unit is X₀, it has been determined that the hidden layer before output only has h1, The input of i.e. second hidden layer h2 includes X₀And X₁, corresponding weight is distributed for each input, and then can determine the output of h2 For X₂；Similarly, the input of third hidden layer h3 includes X₀、X₁And X₂, the output of h3 is X₃；The input packet of 4th hidden layer h4 Include X₀、X₁、X₂And X₃, the output of h4 is X₄.Wherein, X₁、X₂、X₃And X₄It is the output of corresponding hidden layer in the first stage, not It is final output.

Step A3: according to processing sequence, successively according to the output of hidden layer every other in region unit to current hidden layer Output be updated；After updating preset times, using the updated output of hidden layer of cis-position last in processing sequence as The output of region unit.

In the embodiment of the present invention, in second stage, successively the output of hidden layer is updated still according to processing sequence； Due to having existed an output in all hidden layers of second stage, (output can be output in the first stage, can also be with For in the last round of updated output of second stage), it can be updated at this time according to the output of all hidden layers.Specifically , the output of each hidden layer is successively determined according to processing sequence, according to more new formula；More new formula are as follows:

Wherein,I-th of hidden layer indicates activation primitive in the updated output of kth time, g () in expression processing sequence, Generally nonlinear activation function, * indicate convolution algorithm；W_miIndicate the weight between m-th of hidden layer and i-th of hidden layer,Indicate m-th of hidden layer in the updated output of kth time, W_niIndicate the power between n-th of hidden layer and i-th of hidden layer Weight,Indicate n-th of hidden layer in kth -1 time updated output；And as k=1, -1 update of kth expression does not update. In formula, the maximum value of m and n are the number of hidden layer.

Shown in Figure 4, the first time renewal process after second stage expression in the first stage in Fig. 4 updates public K=1 in formula.According to processing sequence, the input of first hidden layer h1 is the output of hidden layer h2, h3, h4 at this time, i.e. X₂、X₃With X₄, at this time according to X₂、X₃And X₄It can determine the output of h1 after the first round updatesAs k > 1, the input of h1 be hidden layer h2, The output of h3, h4 at this time, only the output of hidden layer h2, h3, h4 at this time is in k-1 updated output.It updates followed by When the output of h2, the output current for other hidden layers of the input of h2 at this time, i.e. the output of hidden layer h1, h3, h4 at this time, at this time The output of h3 and h4 remains as X₃And X₄, but the output of h1 has been updated toI.e. at this time according to X₃、X₄WithIt can determine The output of h2 at this timeSimilarly, when updating the output of h3, the input of h3 at this time isAnd X₄, updated h3's Output isWhen updating the output of hidden layer h4 of last cis-position, other hidden layers h1, h2, h3 is had been carried out at this time It updates, i.e., according to the updated output of other hidden layersWithDetermine the updated output of h4Work as front-wheel at this time Update operation terminate.If entire renewal process terminates, exported after at this time can updating h4Output as region unit； If also next round is needed to update, k is carried out plus one, repeats the treatment process of second stage in step A3, until k meet it is preset Maximum value.

In the embodiment of the present invention, the output of hidden layer is determined stage by stage, and in second stage by the output of other hidden layers As the input of current hidden layer, and determine based on the output of all hidden layers being updated the hidden layer of last cis-position Output, and the output as the region unit, can utmostly guarantee the network characteristic of the output of region unit；And the spy of region unit Levy figure dimension will not superlinearity increase, parameter amount and calculation amount can be reduced.Meanwhile the input of each hidden layer is region unit Interior all other layer of output can combine other all hidden layers in this way in the block of zoning when the output of a hidden layer Feature map, the result of this connection type ensures that gradient can be directly accessed in each layer from loss function, therefore makes The transmitting for obtaining network characterization is more efficient, and network is also just more easier to train, it can the very deep network of training.

On the basis of the above embodiments, above-mentioned steps 104 are " by the output of current region block as next region unit Input " includes: that the output to current region block carries out pond processing, determines that the characteristic quantity of current region block, characteristic quantity are 1 × 1 The vector of × C, wherein C is the channel number of the output of current region block；Work as proparea according to the adjustment of the characteristic quantity of current region block The output of domain block, the input by the output of current region block adjusted as next region unit.

In the embodiment of the present invention, the output of region unit can be W × H × C matrix, W representing matrix width, H expression square Battle array height, the channel number of C representing matrix；The output is compressed by the output progress pondization processing to region unit, So as to obtain the characteristic quantity of region unit output, while it also can be effectively controlled over-fitting；Later according to current region block Characteristic quantity adjusts the output (such as dot product etc.) of current region block, so that each channel of the output of region unit adjusted is added The characteristic quantity of the upper region unit, so that the characteristic mass for being transferred to next region unit is higher.Wherein, pondization processing specifically may be used It is handled using global poolization；The process of adjustment region block output can also be according to SN-net (Squeeze-and-Excitation Networks) the SN module provided is handled.

On the basis of the above embodiments, step 104 " output of all areas block is reached full articulamentum " includes: by institute There is outputting and inputting for region unit to reach full articulamentum.

In the embodiment of the present invention, outputting and inputting each region unit to be stitched together and carry out global pool, thus often A region unit obtains a corresponding vector, then all areas block global pool result is stitched together so as to execute most Subject distillation process afterwards.Since loss function is determined according to all areas block, then each region block can directly access ladder Information is spent, gradient disappearance problem caused by loss function information is successively transmitted is avoided.

A kind of subject distillation method based on convolutional neural networks provided in an embodiment of the present invention, based on text to be extracted Term vector can construct two-dimensional term vector matrix, later using having sequentially connected region unit and be all connected with all areas block The subject distillation model of full articulamentum determines the topic weights of word, and then extracts corresponding keyword.Utilize multiple hidden layers Compositing area block, it is possible to reduce the quantity of each hidden layer output characteristic pattern, to reduce the quantity of network parameter；The theme mentions The network structure of modulus type can make the transmitting of network characterization and gradient more efficient, and network is also just more easier to train；And Gradient disappearance problem caused by loss function information is successively transmitted is avoided, ensure that can be avoided while expanding network depth Gradient disappearance problem improves subject distillation model training efficiency.Using the term vector sequence of each sentence as term vector matrix A line, it is ensured that it is subsequent multistage convolutional calculation.Meanwhile stage by stage determine hidden layer output, and second stage by its Input of the output of its hidden layer as current hidden layer, and determined finally based on the output of all hidden layers being updated The output of the hidden layer of cis-position, and the output as the region unit can utmostly guarantee that the network of the output of region unit is special Property；And the characteristic pattern dimension of region unit will not superlinearity increase, parameter amount and calculation amount can be reduced.

The subject distillation method flow based on convolutional neural networks is described in detail above, and this method can also be by corresponding Device realize, the structure and function of the device is described in detail below.

A kind of subject distillation device based on convolutional neural networks provided in an embodiment of the present invention, shown in Figure 5, packet It includes:

Module 51 is obtained, it, successively will be in the text to be extracted for obtaining text to be extracted relevant to network public-opinion Each word be converted to term vector, and determine the term vector matrix of the text to be extracted, distribute a use for each term vector In indicating that corresponding words are the theme the topic weights of keyword possibility size；

Determining module 52 is inputted, is used for according to the term vector matrix construction initial characteristics matrix, by the initial characteristics Matrix as training after subject distillation model input, the subject distillation model include sequentially connected region unit and with institute The full articulamentum for having region unit output end to connect, the output of the full articulamentum are the output of the subject distillation model；

Export determining module 53, for by the subject distillation model input as the first cis-position region unit it is defeated Enter, and determines the output of the region unit；The region unit includes multiple hidden layers, and the input of each hidden layer is from institute State the output of every other hidden layer in region unit；

Global treatment module 54, the input by the output of current region block as next region unit, continues to determine next The output of all areas block until determining the output of all areas block, and is reached full articulamentum by the output of a region unit；It is described Full articulamentum generates the revised topic weights of each term vector of the text to be extracted according to the output of all areas block；

Subject distillation module 55, for extracting the key of the text to be extracted according to the revised topic weights of term vector Word.

On the basis of the above embodiments, the acquisition module 51 includes:

Text acquiring unit, for obtaining web page text information relevant to network public-opinion, the net based on web crawlers Page text information includes one or several sentences, and each sentence includes one or more words；

Word segmentation processing unit, for carrying out denoising duplicate removal processing to the web page text information, and to denoising duplicate removal processing Web page text information afterwards carries out word segmentation processing, using the web page text information after word segmentation processing as text to be extracted.

On the basis of the above embodiments, the acquisition module 51 includes:

Converting unit, for each word in the text to be extracted to be successively converted to term vector, and by described wait mention All term vector sequential concatenations in a sentence of text are taken to form corresponding sentence sequence；

Zero padding unit, for carrying out tail portion zero padding processing to all sentence sequence, tail portion zero padding treated sentence Subsequence sequence length having the same；

Matrix generation unit, for successively using all tail portion zero paddings treated sentence sequence as a line of matrix or One column, generate the term vector matrix of the text to be extracted.

On the basis of the above embodiments, described using the initial characteristics matrix as the subject distillation model after training Input before, which further includes training module；

The training module is used for: building initial model, the initial model include sequentially connected region unit and with institute The full articulamentum for having region unit output end to connect, the output of full articulamentum are the output of the initial model；It is pre- to obtain term vector If matrix, the term vector presets the corresponding topic weights of each term vector in matrix；By the way that the term vector is preset square Battle array as the initial model input, using accordingly revised topic weights as the output of the initial model, to institute It states initial model to be trained, determines the model parameter of the initial model, the initial model of model parameter will be determined as master Topic extracts model.

On the basis of the above embodiments, the acquisition module is that each term vector distributes one for indicating that corresponding words are The topic weights of subject key words possibility size include:

On the basis of the above embodiments, the output determining module 53 includes:

Sequencing unit, for presetting the processing sequence of all hidden layers in the region unit；

Determination unit is exported, determines the implicit of the first cis-position in the processing sequence for the input according to the region unit The output of layer, later according to the processing sequence, implicit according to the input of the region unit and before having determined output The output of layer successively determines the output of each hidden layer；

Updating unit is exported, for according to the processing sequence, successively according to every other hidden layer in the region unit Output the output of current hidden layer is updated；After updating preset times, by cis-position last in the processing sequence Output of the updated output of hidden layer as the region unit.

On the basis of the above embodiments, the output updating unit is used for:

On the basis of the above embodiments, the Global treatment module 54 includes:

Pond unit carries out pond processing for the output to current region block, determines the characteristic quantity of current region block, institute State the vector that characteristic quantity is 1 × 1 × C, wherein C is the channel number of the output of current region block；

Adjustment unit is worked as adjusting the output of current region block according to the characteristic quantity of current region block by adjusted Input of the output of forefoot area block as next region unit.

On the basis of the above embodiments, the subject distillation module is used for:

A kind of subject distillation device based on convolutional neural networks provided in an embodiment of the present invention, based on text to be extracted Term vector can construct two-dimensional term vector matrix, later using having sequentially connected region unit and be all connected with all areas block The subject distillation model of full articulamentum determines the topic weights of word, and then extracts corresponding keyword.Utilize multiple hidden layers Compositing area block, it is possible to reduce the quantity of each hidden layer output characteristic pattern, to reduce the quantity of network parameter；The theme mentions The network structure of modulus type can make the transmitting of network characterization and gradient more efficient, and network is also just more easier to train；And Gradient disappearance problem caused by loss function information is successively transmitted is avoided, ensure that can be avoided while expanding network depth Gradient disappearance problem improves subject distillation model training efficiency.Using the term vector sequence of each sentence as term vector matrix A line, it is ensured that it is subsequent multistage convolutional calculation.Meanwhile stage by stage determine hidden layer output, and second stage by its Input of the output of its hidden layer as current hidden layer, and determined finally based on the output of all hidden layers being updated The output of the hidden layer of cis-position, and the output as the region unit can utmostly guarantee that the network of the output of region unit is special Property；And the characteristic pattern dimension of region unit will not superlinearity increase, parameter amount and calculation amount can be reduced.

The embodiment of the present application also provides a kind of computer storage medium, the computer storage medium is stored with computer Executable instruction, it includes the programs for executing the above-mentioned subject distillation method based on convolutional neural networks, which can Execute instruction the method that can be performed in above-mentioned any means embodiment.

Wherein, the computer storage medium can be any usable medium that computer can access or data storage is set It is standby, including but not limited to magnetic storage (such as floppy disk, hard disk, tape, magneto-optic disk (MO) etc.), optical memory (such as CD, DVD, BD, HVD etc.) and semiconductor memory (such as ROM, EPROM, EEPROM, nonvolatile memory (NAND FLASH), solid state hard disk (SSD)) etc..

Fig. 6 shows the structural block diagram of a kind of electronic equipment of another embodiment of the invention.The electronic equipment 1100 can be the host server for having computing capability, personal computer PC or portable portable computer or end End etc..The specific embodiment of the invention does not limit the specific implementation of electronic equipment.

The electronic equipment 1100 includes at least one processor (processor) 1110, communication interface (Communications Interface) 1120, memory (memory array) 1130 and bus 1140.Wherein, processor 1110, communication interface 1120 and memory 1130 complete mutual communication by bus 1140.

Communication interface 1120 with network element for communicating, and wherein network element includes such as Virtual Machine Manager center, shared storage.

Processor 1110 is for executing program.Processor 1110 may be a central processor CPU or dedicated collection At circuit ASIC (Application Specific Integrated Circuit), or it is arranged to implement the present invention One or more integrated circuits of embodiment.

Memory 1130 is for executable instruction.Memory 1130 may include high speed RAM memory, it is also possible to also wrap Include nonvolatile memory (non-volatile memory), for example, at least a magnetic disk storage.Memory 1130 can also be with It is memory array.Memory 1130 is also possible to by piecemeal, and described piece can be combined into virtual volume by certain rule.Storage The instruction that device 1130 stores can be executed by processor 1110, so that processor 1110 is able to carry out in above-mentioned any means embodiment Method.

Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to include these modifications and variations.

Claims

1. a kind of subject distillation method based on convolutional neural networks characterized by comprising

Obtain relevant to network public-opinion text to be extracted, successively by each word in the text to be extracted be converted to word to Amount, and determine the term vector matrix of the text to be extracted, one is distributed for indicating that corresponding words are the theme for each term vector The topic weights of keyword possibility size；

According to the term vector matrix construction initial characteristics matrix, using the initial characteristics matrix as the subject distillation after training The input of model, the subject distillation model include sequentially connected region unit and connect with all areas block output end entirely connect Layer is connect, the output of the full articulamentum is the output of the subject distillation model；

Input by the input of the subject distillation model as the region unit of the first cis-position, and determine the defeated of the region unit Out；The region unit includes multiple hidden layers, and the input of each hidden layer is every other implicit in the region unit The output of layer；

Input by the output of current region block as next region unit, continues the output for determining next region unit, until It determines the output of all areas block, and the output of all areas block is reached into full articulamentum；The full articulamentum is according to all areas The output of domain block generates the revised topic weights of each term vector of the text to be extracted；

2. the method according to claim 1, wherein it is described using the initial characteristics matrix as training after Before the input of subject distillation model, further includes:

Initial model is constructed, the initial model includes sequentially connected region unit and connect with all areas block output end complete Articulamentum, the output of full articulamentum are the output of the initial model；

It obtains term vector and presets matrix, the term vector presets the corresponding topic weights of each term vector in matrix；Passing through will The term vector presets matrix as the input of the initial model, by corresponding revised topic weights as described initial The output of model is trained the initial model, determines the model parameter of the initial model, will determine model parameter Initial model is as subject distillation model.

3. the method according to claim 1, wherein the term vector matrix packet of the determination text to be extracted It includes:

Each word in the text to be sorted is successively converted into term vector, and will be in a sentence of the text to be extracted All term vector sequential concatenations form corresponding sentence sequence；

Zero padding processing in tail portion carried out to all sentence sequence, tail portion zero padding treated sentence sequence sequence having the same Column length；

Successively using all tail portion zero paddings treated sentence sequence as a row or column of matrix, the text to be extracted is generated This term vector matrix.

4. the method according to claim 1, wherein described distribute one for indicating corresponding for each term vector The be the theme topic weights of keyword possibility size of word include:

One is distributed for indicating that corresponding words are the theme the identical topic weights of keyword possibility size for each term vector； Or

It is that term vector distributes one for indicating that corresponding words are the theme the master of keyword possibility size according to the word frequency of term vector The word frequency of topic weight, the topic weights of the term vector and the term vector is positive correlation.

5. method according to claim 1 to 4, which is characterized in that the output of the determination region unit includes:

Preset the processing sequence of all hidden layers in the region unit；

The output that the hidden layer of the first cis-position in the processing sequence is determined according to the input of the region unit, later according to described Processing sequence, according to the input of the region unit and have determined the hidden layer before output output successively determine it is each hidden Output containing layer；

According to the processing sequence, successively according to the output of every other hidden layer in the region unit to the defeated of current hidden layer It is updated out；After updating preset times, using the updated output of hidden layer of cis-position last in the processing sequence as The output of the region unit.

6. according to the method described in claim 4, it is characterized in that, described according to the processing sequence, successively according to the area The output of every other hidden layer, which is updated the output of current hidden layer, in the block of domain includes:

Wherein,Indicate that i-th of hidden layer is in the updated output of kth time in the processing sequence, g () indicates activation letter Number, * indicate convolution algorithm；W_miIndicate the weight between m-th of hidden layer and i-th of hidden layer,Indicate m-th of hidden layer In the updated output of kth time, W_niIndicate the weight between n-th of hidden layer and i-th of hidden layer,Indicate n-th it is hidden Containing layer in kth -1 time updated output；And as k=1, -1 update of kth expression does not update.

7. method according to claim 1 to 4, which is characterized in that described according to the revised topic weights of term vector The keyword for extracting the text to be extracted includes:

Using revised topic weights be greater than default weight corresponding to term vector as target term vector, by the target word to Measure keyword of the corresponding word as the text to be extracted；Or

Revised topic weights are ranked up, by term vector corresponding to the topic weights of preset quantity maximum after sequence As target term vector, using the corresponding word of the target term vector as the keyword of the text to be extracted.

8. a kind of subject distillation device based on convolutional neural networks characterized by comprising

Module is obtained, for obtaining text to be extracted relevant to network public-opinion, successively by each of described text to be extracted Word is converted to term vector, and determines the term vector matrix of the text to be extracted, distributes one for indicating for each term vector Corresponding words are the theme the topic weights of keyword possibility size；

Determining module is inputted, for according to the term vector matrix construction initial characteristics matrix, the initial characteristics matrix to be made For the input of the subject distillation model after training, the subject distillation model includes sequentially connected region unit and and all areas The full articulamentum of block output end connection, the output of the full articulamentum are the output of the subject distillation model；

Determining module is exported, for the input by the input of the subject distillation model as the region unit of the first cis-position, and really The output of the fixed region unit；The region unit includes multiple hidden layers, and the input of each hidden layer is from the region The output of every other hidden layer in block；

Global treatment module, the input by the output of current region block as next region unit, continues to determine next region The output of all areas block until determining the output of all areas block, and is reached full articulamentum by the output of block；The full connection Layer generates the revised topic weights of each term vector of the text to be extracted according to the output of all areas block；

Subject distillation module, for extracting the keyword of the text to be extracted according to the revised topic weights of term vector.

9. a kind of computer storage medium, which is characterized in that the computer storage medium is stored with computer executable instructions, The computer executable instructions require method described in 1-7 any one for perform claim.

10. a kind of electronic equipment characterized by comprising

At least one processor；And

The memory is stored with the instruction that can be executed by least one described processor, and described instruction is by described at least one It manages device to execute, so that at least one described processor is able to carry out method described in claim 1-7 any one.