CN111160042B

CN111160042B - Text semantic analysis method and device

Info

Publication number: CN111160042B
Application number: CN201911406192.5A
Authority: CN
Inventors: 马雄; 杨旭川
Original assignee: Chongqing Juexiao Technology Co ltd
Current assignee: Chongqing Juexiao Technology Co ltd
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2023-04-28
Anticipated expiration: 2039-12-31
Also published as: CN111160042A

Abstract

The embodiment of the application provides a text semantic analysis method and a text semantic analysis device, which are used for solving the problem of low accuracy of the semantic analysis method in the prior art. The method comprises the following steps: providing and training an analytical model comprising an encoder and a decoder; encoding by the encoder based on the text to be parsed, and generating a first vector matrix; the text to be analyzed comprises a short text and a long text; generating, by the decoder, a context vector matrix based on the first vector matrix, the context vector matrix characterizing contributions to itself of words other than itself; determining a boundary vector matrix based on the context vector matrix, decoding the boundary vector matrix by the decoder, determining whether the long text contains the short text.

Description

Text semantic analysis method and device

Technical Field

The present disclosure relates to the field of text information processing technologies, and in particular, to a text semantic parsing method and apparatus.

Background

With the development of computer technology, many application scenarios are trying to participate in computers, so as to improve efficiency. For example, scoring subjective questions of an examination; also for example, audits for certain text, and the like. In these application scenarios, it is often necessary to determine the similarity between two texts, for example, according to the determined standard answer or the key sentence of the standard answer, to determine whether the semantic meaning of the subjective question answer contains the key sentence of the standard answer.

In the prior art, the similarity between two texts is generally determined based on word senses represented by each word in the two texts. For a sentence or a paragraph of sentences, much other information is included in addition to being composed of individual words. Even if the same phrase constitutes a sentence, other information is different, which causes the meaning expressed by the sentence to be different, and thus the final judgment result is affected. Therefore, how to score according to more comprehensive text information and obtain more accurate scoring results is a problem to be solved.

Disclosure of Invention

The purpose of the application is to provide a text semantic analysis method and a text semantic analysis device, so as to solve the problem of low accuracy of the semantic analysis method in the prior art.

In a first aspect of the present application, a text semantic parsing method is provided, including:

providing and training an analytical model comprising an encoder and a decoder;

encoding by the encoder based on the text to be parsed, and generating a first vector matrix; the text to be analyzed comprises a short text and a long text;

generating, by the decoder, a context vector matrix based on the first vector matrix, the context vector matrix characterizing contributions to itself of words other than itself;

determining a boundary vector matrix based on the context vector matrix, decoding the boundary vector matrix by the decoder, determining whether the long text contains the short text.

In a possible implementation manner, the encoding, by the encoder, based on the text to be parsed, generates a first vector matrix, including:

the encoder is used for inputting the short text and the long text according to a set format by taking a Bert model as the encoder, wherein the set format is [ classification label ] short text [ text segmenter ] long text [ text segmenter ];

and encoding the text to be analyzed through a Bert model to generate the first vector matrix, wherein the number of lines of the first vector matrix is the sum of the lengths of the contents input by adopting the set format.

In a possible implementation manner, the generating, by the decoder, a context vector matrix based on the first vector matrix includes:

generating, by the decoder, a second vector matrix from the long text by determining cues of the short text based on the first vector matrix;

the context vector matrix is generated by the decoder based on the second vector matrix.

In a possible implementation manner, the generating the second vector matrix and the context vector matrix specifically includes:

generating a second vector matrix based on the first vector matrix and DGCNN model;

and generating the context vector matrix based on the second vector matrix and a Self-attribute model.

In a possible implementation manner, the generating a second vector matrix based on the first vector matrix and DGCNN model includes:

generating a first intermediate vector matrix by performing one-dimensional expansion convolution on the first vector matrix;

performing one-dimensional expansion convolution on the first vector matrix, and activating the first vector matrix by using an activation function to generate a second intermediate vector matrix;

the second vector matrix is generated by using the second intermediate vector matrix as a weight based on the first vector matrix, the first intermediate vector matrix, and the second intermediate vector matrix.

In a second aspect of the present application, there is provided a text semantic parsing apparatus, comprising an encoder and a decoder,

the encoder is used for encoding the text to be analyzed and generating a first vector matrix; the text to be analyzed comprises a short text and a long text;

the decoder is configured to generate a context vector matrix based on the first vector matrix; and determining a boundary vector matrix based on the context vector matrix, decoding the boundary vector matrix, and determining whether the long text contains the short text.

In a possible implementation manner, the encoder is specifically configured to:

In a possible implementation manner, the decoder generates, based on the first vector matrix, a context vector matrix by the decoder, including:

the context vector matrix is generated by the decoder based on the first vector matrix and the second vector matrix.

In a possible implementation manner, the decoder generates a second vector matrix and the context vector matrix, specifically including:

the context vector matrix is generated based on the first vector matrix, the second vector matrix, and a Self-intent model.

In a possible implementation manner, the decoder generates a second vector matrix based on the first vector matrix and DGCNN model, including:

In the embodiment of the application, the provided parsing model comprises an encoder and a decoder, after training the parsing model, the text to be parsed can be encoded to obtain a first vector matrix, the context vector matrix is generated by subsequent decoding based on the first vector matrix, the boundary vector matrix is determined according to the context vector matrix, and whether the long text contains the short text is determined by decoding the boundary vector matrix through the decoder. Therefore, the semantic text of the short text is found in the long text, the accurate analysis of subjective questions is further realized, and the accuracy of analysis is improved.

Drawings

FIG. 1 is a schematic diagram of an application architecture according to an embodiment of the present application;

fig. 2 is a flowchart of a text semantic parsing method provided in an embodiment of the present application;

FIG. 3 is a flowchart of encoding text to be parsed according to an embodiment of the present application;

FIG. 4 is a flow chart of generating a context vector matrix provided by an embodiment of the present application;

FIG. 5 is a flowchart illustrating a method for generating a context vector matrix according to an embodiment of the present disclosure;

FIG. 6 is a specific flow chart for generating a second vector matrix provided by an embodiment of the present application;

fig. 7 is a schematic structural diagram of a text semantic parsing device according to an embodiment of the present application.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present application based on the embodiments herein.

It should be noted that the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.

The architecture applied in the present application may be as shown in fig. 1, providing a parsing model comprising an encoder and a decoder. The encoder and decoder may be designed as different modules of the same computing device or may be separate modules. It is apparent that the encoder and decoder can operate a text semantic parsing method provided in the embodiments of the present application, as shown in fig. 2, including the following steps 201 to 204:

training an analytical model comprising an encoder and a decoder 201;

202, encoding by an encoder based on a text to be parsed, and generating a first vector matrix; the text to be parsed comprises short text and long text.

203, generating, by the decoder, a context vector matrix based on the first vector matrix, the context vector matrix characterizing contributions of words other than itself to itself.

204, determining a boundary vector matrix based on the context vector matrix, decoding the boundary vector matrix by a decoder, and determining whether the long text contains the short text.

It should be noted that, step 202 and step 203 may be used as part of training the analytical model, and after step 203, the training of maximizing the probability may be performed, so that step 201 may be implemented.

It should be noted that the term "in this application may refer to a chinese word or a foreign word, or a combination of words specified in some possible embodiments.

In a possible implementation, as shown in fig. 3, step 202 includes:

301, using the Bert model as an encoder, an encoder for inputting short text and long text in a set format. Wherein the set format may be [ class label ] short text [ text separator ] long text [ text separator ].

302, encoding text to be parsed through a Bert model, and generating a first vector matrix. The number of rows of the first vector matrix is the sum of the lengths of the contents input in the set format.

In a possible implementation, as shown in fig. 4, step 203 includes:

a second vector matrix is generated 401 by a decoder determining cues of short text from long text based on the first vector matrix.

A context vector matrix is generated by a decoder based on the second vector matrix 402.

Based on the representation of fig. 4, as shown in fig. 5, in some possible examples, generating the second vector matrix specifically includes: 501, generating a second vector matrix based on the first vector matrix and the DGCNN model. In some possible examples, generating the context vector matrix specifically includes: a context vector matrix is generated 502 based on the second vector matrix and the Self-intent model.

For step 501, in some possible examples, a method as shown in fig. 6 may be used, including:

601, generating a first intermediate vector matrix by performing one-dimensional expansion convolution on the first vector matrix;

602, performing one-dimensional expansion convolution on the first vector matrix, and activating with an activation function to generate a second intermediate vector matrix;

603, generating a second vector matrix by using the second intermediate vector matrix as weights based on the first vector matrix, the first intermediate vector matrix and the second intermediate vector matrix.

From the above embodiments, the embodiments of the present application provide a specific architecture diagram of an analysis model, as shown in fig. 7:

the parsing model includes a Bert model as an encoder 1 of the parsing model, a DGCNN model and a Self-section model as part of a decoder 2. For a clearer understanding of the present solution, based on the specific architecture diagram described above, the following steps are described:

step one, inputting a text to be analyzed into a Bert model, outputting a first vector matrix, and marking the first vector matrix as y; as in equation 1:

y＝Transformer _encoder (x，z)； (1)

in this application, the Bert model plays a role here in encoding text to be parsed into a vector matrix, and Bert is composed of 12 layers transformer encoder. The text to be analyzed is spliced by a long text z and a short text x, the length after combination is L, L is the sum of the lengths of the long text z, the short text x, the classification labels and the segmenters, and the specific format is [ CLS ] short text [ SEP ] long text [ SEP ], wherein: CLS is a class label in Bert, [ SEP ] is a text separator in Bert. In some examples, the first vector matrix may be a l×768-dimensional vector matrix.

And step two, inputting the first vector matrix into the DGCNN model, and outputting a second vector matrix. The main role of the DGCNN model here is to set all content information in the long text z that is not related to the short text x to null. Here there are 4 different DGCNNs, the difference is in the magnitude of the different expansion convolution rates, which are 1,2,4,8, so there are 4 DGCNNs, the output of the last DGCNN is taken as the input of the next DGCNN, as shown in equation 2:

wherein g is DGCNN model output, conv1D ₁ And Conv1D ₂ Is a one-dimensional expansion convolution functionThe number, σ, is the activation function,

is a tensor product operator. Clearly Conv1D ₁ The activation function is not applied and,

it is only a linear transformation, so Conv1D ₁ (y) -y can be combined together, equivalent to a single Conv1D ₁ 。

Taking the first vector matrix as l×768-dimensional vector matrix as an example, the description will be given. Specific:

the first vector matrix y is subjected to one-dimensional expansion convolution, the expansion rate is 1, the convolution window is 3 x 768, the number of convolution kernels is 768, the function of the first vector matrix y is linear transformation, and a first intermediate vector matrix is obtained, and the first intermediate vector matrix is an L x 768-dimensional vector and is marked as g1.

The first vector matrix y is subjected to one-dimensional expansion convolution, the expansion rate is 1, the convolution window is 3 x 768, and the convolution window is activated by using a sigmoid function (i.e. sigma function), wherein the function is to convert the vector into a probability, so as to obtain a second intermediate vector matrix, and the second intermediate vector matrix is an L x 768-dimensional vector matrix and is marked as g2.

Based on the first vector matrix, the first intermediate vector matrix, and the second intermediate vector matrix, a second vector matrix is generated by using the second intermediate vector matrix as a weight, and the second vector matrix is an l×768-dimensional vector matrix, denoted as g3. As in equation 3:

g3＝y*(1-g2)+g1*g2； (3)

i.e. the information is represented to pass directly with a probability of 1-g2 and is transformed with a probability of g2.

And thirdly, inputting the second vector matrix g3 into a Self-attribute model to generate a context vector matrix h. The Self-intent model plays a role here to make the information related to the short text more condensed out of the long text, and can be used as follows with equations 4 and 5.

h＝Self Attention(g)； (5)

Wherein Q, K and V are L×768-dimensional vector matrix.

The present application also provides a pointer to complete final decoding, in which, in step four, a context vector matrix h is input into the pointer, a boundary vector matrix is generated through the context vector matrix h, the boundary vector matrix is decoded, the upper boundary and the lower boundary of a sentence which can prove a short text are searched, and whether the long text contains the short text is determined. The method specifically comprises the following steps:

searching a starting position, performing linear transformation on the first vector matrix y, and outputting a vector matrix b1 of L1 x 1; searching an end position, carrying out linear transformation on the first vector matrix y, and outputting a vector matrix b2 of L1 x 1; taking the position p1 of the maximum value of b1; taking the position p2 of the maximum value of b2; the content between p1 and p2 is a sentence which can prove short text. If p1 and p2 are both in the [ CLS ] position, then this long text is indicated as having no short text; otherwise, it indicates that there is short text in this long text.

In a second aspect of the present application, as shown in fig. 7, there is provided a text semantic parsing apparatus, including an encoder 1 and a decoder 2,

the encoder 1 is used for encoding the text to be analyzed to generate a first vector matrix; the text to be analyzed comprises a short text and a long text;

a decoder 2 for generating a context vector matrix based on the first vector matrix; and determining a boundary vector matrix based on the context vector matrix, decoding the boundary vector matrix, and determining whether the long text contains the short text.

In a possible implementation, the encoder 1 is specifically configured to:

taking the Bert model as an encoder 1, and inputting short texts and long texts according to a set format, wherein the set format is [ classification labels ] short texts [ text separators ] long texts [ text separators ];

and encoding the text to be analyzed through the Bert model to generate a first vector matrix, wherein the number of lines of the first vector matrix is the sum of the lengths of the contents input in a set format.

In a possible implementation, the decoder 2 generates, by the decoder, a context vector matrix based on the first vector matrix, including:

generating a second vector matrix from the long text by determining cues of the short text by a decoder based on the first vector matrix;

a context vector matrix is generated by a decoder based on the first vector matrix and the second vector matrix.

In a possible implementation manner, the decoder 2 generates a second vector matrix and a context vector matrix, specifically including:

generating a second vector matrix based on the first vector matrix and the DGCNN model;

a context vector matrix is generated based on the first vector matrix, the second vector matrix, and the Self-intent model.

In a possible implementation, the decoder generates a second vector matrix based on the first vector matrix and the DGCNN model, including:

then, performing one-dimensional expansion convolution on the first vector matrix, and activating the first vector matrix by using an activation function to generate a second intermediate vector matrix;

based on the first vector matrix, the first intermediate vector matrix, and the second intermediate vector matrix, the second vector matrix is generated by using the second intermediate vector matrix as a weight.

In the embodiment of the application, the provided parsing model comprises an encoder and a decoder, after training the parsing model, the text to be parsed can be encoded to obtain a first vector matrix, the context vector matrix is generated by subsequent decoding based on the first vector matrix, the boundary vector matrix is determined according to the context vector matrix, and whether the long text contains the short text is determined by decoding the boundary vector matrix through the decoder. The method has the advantages that the Bert model is used as a powerful encoder, the DGCNN door is used for mechanical design of a text inclusion model, and the Self-intent model is combined, so that the semantic text of the short text is found in the long text, further, the accurate analysis of subjective questions is realized, and the analysis accuracy is improved.

In the present embodiments, for a software implementation, the techniques described herein may be implemented by means of units that perform the functions described herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially or, what contributes to the prior art, or part of the technical solutions, may be embodied in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk, etc.

Although embodiments of the present application have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the application, and such modifications and variations are within the scope defined by the appended claims.

Claims

1. A text semantic parsing method, comprising:

providing and training an analytical model comprising an encoder and a decoder;

based on a text to be analyzed, using a Bert model as the encoder, and inputting a short text and a long text included in the text to be analyzed into the encoder according to a set format, wherein the set format is [ classification label ] short text [ text segmenter ] long text [ text segmenter ]; encoding the text to be analyzed through the Bert model to generate a first vector matrix, wherein the number of lines of the first vector matrix is the sum of the lengths of the contents input by adopting the set format;

2. The method of claim 1, wherein the generating, by the decoder, the context vector matrix based on the first vector matrix comprises:

3. The method of claim 2, wherein the generating the second vector matrix and the context vector matrix specifically comprises:

generating the second vector matrix based on the first vector matrix and DGCNN model;

4. The method of claim 3, wherein the generating the second vector matrix based on the first vector matrix and the DGCNN model comprises:

5. A text semantic parsing apparatus is characterized by comprising an encoder and a decoder,

the encoder is used for inputting short texts and long texts included in the text to be analyzed into the encoder according to a set format by taking a Bert model as the encoder based on the text to be analyzed, wherein the set format is [ classification labels ] short texts [ text separators ] long texts [ text separators ]; encoding the text to be analyzed through the Bert model to generate a first vector matrix, wherein the number of lines of the first vector matrix is the sum of the lengths of the contents input by adopting the set format;

6. The apparatus of claim 5, wherein the decoder generating, by the decoder, the context vector matrix based on the first vector matrix comprises:

7. The apparatus of claim 6, wherein the decoder generates the second vector matrix and the context vector matrix, comprising:

8. The apparatus of claim 7, wherein the decoder generates the second vector matrix based on the first vector matrix and the DGCNN model, comprising: