CN108614815A

CN108614815A - Sentence exchange method and device

Info

Publication number: CN108614815A
Application number: CN201810426835.1A
Authority: CN
Inventors: 贺樑; 邓勇; 杨燕
Original assignee: East China Normal University
Current assignee: East China Normal University
Priority date: 2018-05-07
Filing date: 2018-05-07
Publication date: 2018-10-02

Abstract

The present invention discloses a kind of sentence exchange method and device, this method include：Vectorization step, the sentence vectorization that will be interacted；Processing step is handled the sentence after vectorization using two-way shot and long term memory network, obtains the sentence expression for including contextual information；Interactive step, in attention mechanism using it is multiple and different can Variable Learning sentence expression is interacted, obtain multiple interaction results with different visual angles；Splice step, interaction results are spliced, the full view with mutual perception for obtaining sentence indicates；Wherein, using linear classifier pair can Variable Learning classify so that it is each can Variable Learning it is different.

Description

Sentence exchange method and device

Technical field

The present invention relates to natural language processing fields, specifically, being related to a kind of sentence exchange method and device.

Background technology

Sentence interaction is a basic demand in natural language processing task, reads and understands in machine, and text accumulates Contain, answer is chosen, and plays the role of in the tasks such as Semantic Similarity Measurement particularly important.The purpose of sentence interaction is to obtain Semantic similarities and differences information between sentence merges into row information, obtains the unified representation of multiple sentences, facilitate subsequent processing.How The interaction between sentence is preferably carried out, is always the research hotspot of the prior art, as text implication, answer is chosen, and sentence is similar The relevant technologies of the particular task based on sentence interaction such as degree calculating emerge one after another.

Existing sentence exchange method can probably be divided into two classes, single interaction and repeatedly interaction.Single interacts between sentence Mode as the term suggests only interaction is primary for two sentences, this kind of interaction due to its implement it is simple and effective therefore most commonly seen.It is existing Have in technology, by being projected sentence expression by specially designed coding layer before interaction in natural language inference task It behind specific space, then is once interacted in this space, the feature for finally carrying out profound level to the expression that interaction obtains again carries Fetch classification.

The characteristics of method of Quito time interaction is by the way that the single for being designed relatively simple interaction is applied multiple times, to reach language The higher level of justice is abstract or pays close attention to the multifarious purpose of angle.In the prior art, by sentence matrix max_pooling and Four kinds of dimensionality reductions operation such as avg_pooling, four kinds of vectors for obtaining sentence indicate, then interact to reach interactive more respectively The purpose of sample, but it is similar to max_ since this method needs are existing by using the acquisition modes of a variety of sentences vector The dimensionality reductions such as pooling operate interaction limited angle that is extremely limited, therefore can indicating, cannot easily increase more various visual angles. Other prior arts propose multi-hop structures, interact to reach again by the result after interacting sentence Profound interaction purpose, but the interaction next time of this method is played the role of based on last interaction results It is the characteristic information for deepening constantly single visual angle and extracting.

To sum up, these existing methods or be by using it is a variety of operation obtain sentences vector indicate handed over again Mutually, it is not solved the problems, such as from interactive diversity angle, is not easy to be generalized to more visual angles or is that single angle is profound Interaction, the information that interaction obtains are not comprehensive enough.

Invention content

For above-mentioned problem, the present invention proposes a kind of attention mechanism based on various visual angles to realize between sentence more Comprehensive interaction, this method are reduced it is only necessary to which the otherness at visual angle by constraining parameter, can not only be effectively ensured Information redundancy, and it is only necessary to using different parameters repeat single operation, can by interactive interactive expanding to arbitrarily it is more It is a.

In a first aspect, an embodiment of the present invention provides a kind of sentence exchange methods, including：Vectorization step will carry out Interactive sentence vectorization；Processing step handles the sentence after vectorization using two-way shot and long term memory network, obtains Include the sentence expression of contextual information；Interactive step, can Variable Learning distich using multiple and different in attention mechanism Subrepresentation interacts, and obtains multiple interaction results with different visual angles；Splice step, interaction results are spliced, are obtained The full view with mutual perception to sentence indicates；Wherein, using linear classifier pair can Variable Learning classify so that It is each can Variable Learning it is different.

Second aspect, the present invention disclose a kind of sentence interactive device, including：Vectorization module, for that will interact Sentence vectorization；Processing module is obtained for being handled the sentence after vectorization using two-way shot and long term memory network Include the sentence expression of contextual information；Interactive module, being used for can Variable Learning using multiple and different in attention mechanism Sentence expression is interacted, multiple interaction results with different visual angles are obtained；Concatenation module, for being carried out to interaction results Splicing, the full view with mutual perception for obtaining sentence indicate；Wherein, using linear classifier pair can Variable Learning divide Class so that it is each can Variable Learning it is different.

The third aspect, the embodiment of the present invention provide a kind of computing device, including memory and one or more at Manage device；Wherein, computing device further includes：One or more units, one or more units be stored in memory and by with It is set to and is executed by one or more processors, one or more units include the instruction for executing following steps：Will carry out Interactive sentence vectorization；The sentence after vectorization is handled using two-way shot and long term memory network, is obtained comprising up and down The sentence expression of literary information；In attention mechanism using it is multiple and different can Variable Learning sentence expression is interacted, obtain To multiple interaction results with different visual angles；Interaction results are spliced, the regarding entirely with mutual perception of sentence is obtained Angle indicates；Wherein, using linear classifier pair can Variable Learning classify so that it is each can Variable Learning it is different.

Further, the embodiment of the present invention provides the computer program product being used in combination with computing device, and feature exists In, including computer-readable storage medium and it is embedded in computer program mechanism therein；Wherein, computer program mechanism packet Include the instruction for executing following steps：The sentence vectorization that will be interacted；Using two-way shot and long term memory network to vectorization Sentence afterwards is handled, and the sentence expression for including contextual information is obtained；It can using multiple and different in attention mechanism Variable Learning interacts sentence expression, obtains multiple interaction results with different visual angles；Interaction results are spliced, The full view with mutual perception for obtaining sentence indicates；Wherein, using linear classifier pair can Variable Learning classify, make It is each can Variable Learning it is different.

Compared with prior art, the main distinction and its effect are embodiment of the present invention：

Embodiment of the present invention is based on various visual angles attention mechanism, and sentence is obtained by applying constraint to attention mechanism The expression of multi-angle takes full advantage of the information of sentence different level, interactive modeling that can effectively between sentence.

Further, embodiment of the present invention reaches different visual angles by constraining the parameter in bilinearity attention mechanism Interaction purpose very simply be easy to calculate, therefore can be with Parallel Implementation, also moreover, because be to expand network from width It can be adjusted according to the complexity and difficulties of task.

Further, embodiment of the present invention can easily be applied to the natural language for needing to carry out to interact between sentence Say processing task, and interaction is based on sentence matrix (information for include word), rather than sentence is vectorial (with single vector table Show entire sentence), therefore can easily be applied to machine and read the follow-up natural language for needing word class information such as understanding It says in processing task.

Description of the drawings

To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for For those of ordinary skill in the art, without creative efforts, other are can also be obtained according to these attached drawings Attached drawing.

Fig. 1 is the flow chart of sentence exchange method 100 according to a first embodiment of the present invention.

Fig. 2 is sentence interactive structure schematic diagram according to the ... of the embodiment of the present invention.

Fig. 3 is the structure diagram of the sentence interactive device 300 for third embodiment of the invention.

Specific implementation mode

To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention Formula is described in further detail.Illustrative system and method embodiment described herein is not intended to be limited.

First embodiment

Fig. 1 is the flow chart of sentence exchange method 100 according to the ... of the embodiment of the present invention.As shown in Figure 1, this method is specifically located It is as described below to manage flow：

S110, the sentence vectorization that will be interacted.

According to an embodiment of the invention, the sentences for needing interaction for given two, by searching for word2vec by sentence In each word indicated with vector.

Then S120, the sentence inputting after vectorization to two-way shot and long term memory network (BILSTM) obtained carrying up and down The sentence expression P ∈ R of literary information^n×hAnd Q ∈ R^m×h。

S130, in attention mechanism using it is multiple and different can Variable Learning sentence expression is interacted, obtain more A interaction results with different visual angles.

As illustrated in fig. 2, it is assumed that needing to interact from K angle now, then needing K is a can Variable Learning W¹, W²..., W^K∈R^h×h, following formula is then calculated, to obtain K attention degree：

A^k=softmax (PW^kQ)

Then P being indicated again with Q, that is, having obtained the P perceived with Q, specific calculating is as follows：

P^k=A^kQ

Then S140, interaction results are spliced, the full view with mutual perception for obtaining sentence indicates.

By the expression of K different visual angles by following operation, the full view with mutual perception for obtaining sentence indicates：

G=[P¹；P²；...；P^k] E, E ∈ R^t×h, t=K × h

Wherein, E is a parameter that can learn, [；] indicate vectorial concatenation.

With the embodiment of the present invention, such as answer is chosen for convenience, the semantic similarity of sentence, and text contains, machine Read the application for the specific tasks such as understanding, can by the following method by information fusion to input as particular task layer, For example, the pointer network that machine is read in understanding predicts the start-stop position in answer section,

G '=[P；G；P⊙G]⊙sigmoid([P；G；P⊙G]C)

Wherein ⊙ indicates dot product (element-wise-multiplication), C ∈ R^3h×3h。

In addition, according to an embodiment of the invention, in order to make K visual angle have differentiation, reducing the repeat region at K visual angle as possible (redundancy for namely reducing information), can use linear classifier F：

F (X)=XV, X ∈ R^h×h, V ∈ R^h×K

Wherein P (Y=k | X) indicates the X points of probability for K classes, V be one can learning parameter.

That when input is k-th of parameter W^kWhen, this it is required that prediction obtain kth class probability P (Y=k | X= W^k) maximum.Since V is simple enough, the value of W will appear diversification, that is to say, that W¹, W²..., W^KValue not phase Together, the P calculated at this time¹, P²..., P^KAlso it differs.

Therefore, the sentence exchange method of the embodiment of the present invention is based on various visual angles attention mechanism, by attention mechanism Apply constraint to obtain the expression of sentence multi-angle, takes full advantage of the information of sentence different level, can be effectively sentence Between interactive modeling.

The each method embodiment of the present invention can be realized in a manner of software, hardware, firmware etc..No matter the present invention be with Software, hardware or firmware mode realize that instruction code may be stored in any kind of computer-accessible memory In (such as permanent either revisable volatibility is either non-volatile solid or non-solid, it is fixed or The replaceable medium etc. of person).Equally, memory may, for example, be programmable logic array (Programmable Array Logic, referred to as " PAL "), random access memory (Random Access Memory, referred to as " RAM "), programmable read-only deposit Reservoir (Programmable Read Only Memory, referred to as " PROM "), read-only memory (Read-Only Memory, letter Claim " ROM "), electrically erasable programmable read-only memory (Electrically Erasable Programmable ROM, referred to as " EEPROM "), disk, CD, digital versatile disc (Digital Versatile Disc, referred to as " DVD ") etc..

Second embodiment

Fig. 3 is the schematic block diagram of sentence interactive device 300 according to the ... of the embodiment of the present invention.The device is above-mentioned for executing Method flow, including：

Vectorization module 310, the sentence vectorization for that will interact；

Processing module 320 is wrapped for being handled the sentence after vectorization using two-way shot and long term memory network Sentence expression containing contextual information；

Interactive module 330, in attention mechanism using it is multiple and different can Variable Learning sentence expression is carried out Interaction, obtains multiple interaction results with different visual angles；

Concatenation module 340 obtains the full view table with mutual perception of sentence for splicing to interaction results Show；

Wherein, using linear classifier pair can Variable Learning classify so that it is each can Variable Learning it is different.

First embodiment is method embodiment corresponding with present embodiment, and present embodiment can be with first embodiment It works in coordination implementation.The relevant technical details mentioned in first embodiment are still effective in the present embodiment, in order to reduce weight Multiple, which is not described herein again.Correspondingly, the relevant technical details mentioned in present embodiment are also applicable in first embodiment In.

Therefore, the sentence interactive device of the embodiment of the present invention is based on various visual angles attention mechanism, by attention mechanism Apply constraint to obtain the expression of sentence multi-angle, takes full advantage of the information of sentence different level, can be effectively sentence Between interactive modeling.

Further, based on the same technical idea, the embodiment of the present invention provides a kind of computing device, including storage Device and one or more processor；Wherein, computing device further includes：One or more units, one or more unit quilts It stores in memory and is configured to be executed by one or more processors, one or more units include following for executing The instruction of step：

The sentence vectorization that will be interacted；

The sentence after vectorization is handled using two-way shot and long term memory network, obtains the sentence for including contextual information Subrepresentation；

In attention mechanism using it is multiple and different can Variable Learning sentence expression is interacted, obtain multiple having The interaction results of different visual angles；

Interaction results are spliced, the full view with mutual perception for obtaining sentence indicates；

Based on the same technical idea, the computer journey being used in combination with computing device is provided according to an embodiment of the invention Sequence product, which is characterized in that including computer-readable storage medium and be embedded in computer program mechanism therein；Wherein, Computer program mechanism includes executing the instruction of following steps：

The sentence vectorization that will be interacted；

It should be appreciated that in order to simplify the disclosure and help to understand one or more of each inventive aspect, it is right above In the description of exemplary embodiment of the present invention, each feature of the invention be grouped together into sometimes single embodiment, figure or In person's descriptions thereof.However, the method for the disclosure should be construed to reflect following intention：I.e. claimed hair It is bright to require features more more than the feature being expressly recited in each claim.More precisely, such as claims institute As reflection, inventive aspect is all features less than single embodiment disclosed above.Therefore, it then follows specific embodiment party Thus claims of formula are expressly incorporated in the specific implementation mode, wherein each claim itself is as the present invention's Separate embodiments.

In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments In included certain features rather than other feature, but the combination of the feature of different embodiments means in of the invention Within the scope of and form different embodiments.For example, in detail in the claims, embodiment claimed it is one of arbitrary It mode can use in any combination.

Although disclosed herein various aspects and embodiment, other aspects and embodiment are for those skilled in the art For will be apparent.Various aspects and embodiment disclosed herein are for illustrative purposes, and are not intended to be limited, very Real range should be indicated by the full scope for the equivalent that appended claims and such claim are authorized to.Also It is appreciated that term used herein is merely to describe the purpose of specific embodiment, and be not intended to be limited.

Claims

1. a kind of sentence exchange method, which is characterized in that including：

Vectorization step, the sentence vectorization that will be interacted；

Processing step is handled the sentence after vectorization using two-way shot and long term memory network, is obtained comprising up and down The sentence expression of literary information；

Interactive step, in attention mechanism using it is multiple and different can Variable Learning the sentence expression is interacted, obtain To multiple interaction results with different visual angles；

Splice step, the interaction results are spliced, the full view with mutual perception for obtaining the sentence indicates；

Wherein, using linear classifier to it is described can Variable Learning classify so that it is each it is described can the mutual not phase of Variable Learning Together.

2. sentence exchange method according to claim 1, which is characterized in that the interaction results are：

P^k=A^kQ

Wherein, the sentence expression is respectively P ∈ R^n×hWith Q ∈ R^m×h, attention degree A^k=softmax (PW^kQ), described in K Can Variable Learning be W¹, W²..., W^K∈R^h×h。

3. sentence exchange method according to claim 2, which is characterized in that the full view is expressed as：

G=[P¹；P²；...；P^k] E, E ∈ R^t×h, t=K × h

Wherein, E is the parameter that can learn, [；] indicate vectorial concatenation.

4. according to claim 1-3 any one of them sentence exchange methods, which is characterized in that the linear classifier is：

F (X)=XV, X ∈ R^h×h, V ∈ R^h×K

Wherein, V be can learning parameter, P (Y=k | X) indicates the X points of probability for kth class, when input be can described in k-th Variable Learning W^kWhen, it is required that prediction obtain kth class probability P (Y=k | X=W^k) maximum.

5. a kind of sentence interactive device, which is characterized in that including：

Vectorization module, the sentence vectorization for that will interact；

Processing module, for being handled the sentence after vectorization using two-way shot and long term memory network, including The sentence expression of contextual information；

Interactive module, in attention mechanism using it is multiple and different can Variable Learning the sentence expression is handed over Mutually, multiple interaction results with different visual angles are obtained；

Concatenation module obtains the full view table with mutual perception of the sentence for splicing to the interaction results Show；

6. sentence interactive device according to claim 5, which is characterized in that the interaction results are：

P^k=A^kQ

7. sentence interactive device according to claim 6, which is characterized in that the full view is expressed as：

G=[P¹；P²；...；P^k] E, E ∈ R^t×h, t=K × h

8. according to claim 6-7 any one of them sentence interactive devices, which is characterized in that the linear classifier is：

F (X)=XV, X ∈ R^h×h, V ∈ R^h×K

9. a kind of computing device, including memory and one or more processor；Wherein, the computing device further includes：

One or more units, one or more of units are stored in the memory and are configured to by one Or multiple processors execute, one or more of units include the instruction for executing following steps：

The sentence vectorization that will be interacted；

In attention mechanism using it is multiple and different can Variable Learning the sentence expression is interacted, obtain multiple having The interaction results of different visual angles；

The interaction results are spliced, the full view with mutual perception for obtaining the sentence indicates；

10. a kind of computer program product being used in combination with computing device as claimed in claim 9, which is characterized in that packet It includes computer-readable storage medium and is embedded in computer program mechanism therein；Wherein, the computer program mechanism packet Include the instruction for executing following steps：

The sentence vectorization that will be interacted；