CN109684640B

CN109684640B - Semantic extraction method and device

Info

Publication number: CN109684640B
Application number: CN201811602371.1A
Authority: CN
Inventors: 杨双; 胡加学; 宋时德
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2018-12-26
Filing date: 2018-12-26
Publication date: 2023-05-30
Anticipated expiration: 2038-12-26
Also published as: CN109684640A

Abstract

The application provides a semantic extraction method and device, which are used for extracting features representing knowledge points of an object to be understood, features representing context information of the object to be understood, features representing interpretation information and/or paraphrasing information of the object to be understood and determining semantics according to the extracted features. Because the object to be understood can be expressed from different dimensions by using the features extracted from different bases, the determined semantics are more accurate.

Description

Semantic extraction method and device

Technical Field

The present application relates to the field of natural language understanding, and in particular, to a semantic extraction method and apparatus.

Background

Natural language understanding is one of the most important directions in the field of artificial intelligence, and has been a hotspot for research by researchers in the relevant fields. In recent years, with the development of deep learning, reinforcement learning and other technologies, it is becoming more and more desired to make machines understand natural language.

But the accuracy of the existing semantic representation method for the expression of the object to be understood needs to be improved.

Disclosure of Invention

The application provides a semantic extraction method and device, and aims to improve the accuracy of semantic expression.

In order to achieve the above object, the present application provides the following technical solutions:

a semantic extraction method, comprising:

acquiring an object to be understood;

extracting characteristics of the object to be understood, wherein the characteristics comprise at least one of the following: a first feature representing a knowledge point to which the object to be understood belongs, a second feature representing context information of the object to be understood, and a third feature representing interpretation information and/or paraphrasing information of the object to be understood;

and determining the semantics of the object to be understood according to the characteristics.

Optionally, the method for extracting the first feature includes:

inputting the word segmentation vector into a first feature extraction module to obtain the first feature output by the first feature extraction module;

the first feature extraction module is trained by using a sample object marked with a knowledge point, a positive example of the same knowledge point as the sample object, and a negative example of different knowledge points as the sample object in advance.

Optionally, the method for extracting the second feature includes:

inputting the word segmentation vector into a second feature extraction module to obtain the second feature output by the second feature extraction module;

the word segmentation vector is a vector generated according to the object to be understood, and the second feature extraction module is obtained by training a sample object and a historical context object of the sample object in advance.

Optionally, the method for extracting the third feature includes:

inputting the superposition vector into a third feature extraction module to obtain the third feature output by the third feature extraction module;

the superposition vector is a vector of a paraphrasing of the object to be understood and/or a vector obtained by superposing an interpretation information vector and a vector generated according to the object to be understood;

the third feature extraction module is obtained by training in advance by using a superposition vector of a sample object, a positive example of the same knowledge point as the sample object and a negative example of the different knowledge point as the sample object.

Optionally, the process of obtaining the superposition vector includes:

constructing a data set by the object to be understood and the interpretation information of the object to be understood;

converting words in the dataset into word vectors;

and superposing the vector of the object to be understood and the vector of the paraphrasing of the object to be understood to obtain the superposition vector.

Optionally, the determining the semantics of the object to be understood according to the features includes:

inputting the word segmentation vector and the features into a semantic extraction module to obtain the semantics of the object to be understood output by the semantic extraction module;

the semantic extraction module is trained by using a sample object, characteristics of the sample object, a positive example of the sample object belonging to the same knowledge point and a negative example of the sample object belonging to different knowledge points in advance.

A semantic extraction device comprising:

the acquisition module is used for acquiring an object to be understood;

the feature extraction module is used for extracting features of the object to be understood, wherein the features comprise at least one of the following: a first feature representing a knowledge point to which the object to be understood belongs, a second feature representing historical context information of the object to be understood, and a third feature representing interpretation information and/or paraphrasing information of the object to be understood;

and the semantic determining module is used for determining the semantics of the object to be understood according to the characteristics.

Optionally, the feature extraction module includes:

at least one of the first feature extraction module, the second feature extraction module and the third feature extraction module;

the first feature extraction module is used for obtaining the first feature according to the input word segmentation vector; the first feature extraction module is obtained by training a sample object marked with a knowledge point, a positive example of the same knowledge point as the sample object and a negative example of different knowledge points as the sample object in advance;

the second feature extraction module is used for obtaining the second feature according to the input word segmentation vector; the second feature extraction module is obtained by training a sample object and a historical context object of the sample object in advance;

the third feature extraction module is used for obtaining the third feature according to the input superposition vector; the third feature extraction module is obtained by training in advance by using a superposition vector of a sample object, a positive example of the same knowledge point as the sample object and a negative example of the different knowledge point as the sample object;

the word segmentation vector is a vector generated according to the object to be understood.

A semantic extraction device comprising:

a memory and a processor;

the memory is used for storing one or more programs;

the processor is configured to execute the one or more programs, so that the human-computer language interaction device implements the semantic extraction method.

A computer readable medium having instructions stored therein which, when run on a computer, cause the computer to perform the above-described semantic extraction method.

The semantic extraction method and device extracts the characteristics of knowledge points of the object to be understood, the characteristics of context information of the object to be understood, the characteristics of interpretation information and/or paraphrasing information of the object to be understood, and determines the semantic according to the extracted characteristics. Because the object to be understood can be expressed from different dimensions by using the features extracted from different bases, the determined semantics are more accurate.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a diagram of a semantic extraction method disclosed in an embodiment of the present application;

fig. 2 is a schematic structural diagram of a semantic extraction model disclosed in an embodiment of the present application;

FIG. 3 is a schematic diagram of a second feature extraction module;

FIG. 4 is a flow chart of yet another semantic extraction method disclosed in an embodiment of the present application;

fig. 5 is a schematic structural diagram of a semantic extraction device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

Fig. 1 is a schematic diagram of a semantic extraction method according to an embodiment of the present application, including the following steps:

s101: and acquiring an object to be understood.

The object to be understood is an object to be extracted with semantics, and the object to be understood can be a word or a sentence composed of words.

S102: a first feature characterizing a knowledge point to which an object to be understood belongs is extracted.

The knowledge points are used for representing preset classifications, and the objects to be understood belonging to one classification are objects belonging to the knowledge points representing the classification. For example, the sentence "I want to check the telephone fee" belongs to the knowledge point "check the telephone fee".

S103: a second feature is extracted that characterizes context information of the object to be understood.

The context information of the object to be understood is information located above or below the object to be understood in the history dialogue, for example, a history context object of the object to be understood. For example, in a history dialogue of an automatic client system of an operator, assuming that a sentence input by a user is "i want to check a telephone fee", an answer given by the automatic customer service system is "you want to check a telephone fee for which month", and "you want to check a telephone fee for which month" is a following object of "i want to check a telephone fee".

S104: and extracting third characteristics of interpretation information and/or paraphrasing information of the object to be understood.

The interpretation information of the object to be understood is information for interpreting the object to be understood, for example, the interpretation information of "telephone charge" is "pre-stored charge of telephone". The paraphrasing information is information similar to the meaning of the object to be understood, such as the paraphrasing object of the object to be understood, wherein the interpretation information and the paraphrasing information can be obtained through the existing data collection channel and the data collection technology.

S105: and determining the semantics of the object to be understood according to the object to be understood, the first feature, the second feature and the third feature.

In the flow shown in fig. 1, features characterizing knowledge points to which an object to be understood belongs, features characterizing context information of the object to be understood, features characterizing interpretation information and/or paraphrasing information of the object to be understood, and determining semantics according to the extracted features are extracted, respectively. The object to be understood can be expressed from different dimensions by using the features extracted from different bases, so that the determined semantics can express the object to be understood more accurately.

In the method shown in fig. 1, S102-S105 may be implemented using a preset semantic extraction model.

As shown in fig. 2, the semantic extraction model according to the present embodiment includes: the device comprises a first feature extraction module 1, a second feature extraction module 2, a third feature extraction module 3, a feature stitching module 4 and a semantic extraction module 5.

Wherein, the first feature extraction module 1 is used for extracting the first features. Specifically, the first feature extraction module 1 includes: an input layer 11, a bi-directional LSTM layer 12, an attention layer 13 and an output layer 14.

The input layer 11 is used for inputting word vectors.

The bi-directional LSTM layer 12 is used to operate on word vectors to establish timing between the word vectors. For example, the step size of the bidirectional LSTM layer is 30, and the number of hidden layer neuron nodes is 100, then for each hidden layer of the forward and backward LSTM, the (30, 100) dimensional features are output respectively, and the forward and backward features are spliced to obtain the hidden layer feature output of the sentence of the (30, 200) dimension. And (3) performing self-attitution operation on the output (30, 200) dimensional feature matrix, calculating an influence weight of each hidden layer on the whole sentence, and finally performing weighted average on the whole hidden layer output feature matrix to obtain a 200-dimensional vector.

The intent layer 13 is used for calculating the output result of the bidirectional LSTM layer 12, and aims to increase the weight of the key words and reduce the weight of the non-key words. Specifically, (1) a nonlinear transformation σ (wx+b) is performed on each dimension of the input matrix, where W and b are shared by each dimension. (2) And introducing an external variable U (the dimension of the variable is the same as the dimension of the vector serving as an input object), and performing inner product with each dimension in the step (1) to achieve the purpose of controlling the input weight. (3) And (3) performing softmax operation on the vector after the inner product in the step (2), and performing weight distribution. (4) And (3) linearly combining the input matrixes by using the output weights in the step (3) to output as an attention layer.

The output layer 14 is configured to output the output result of the attention layer 13 as a fixed-length vector, i.e., a first feature vector.

The training process of the first feature extraction model 1 is as follows: the sample is a labeled sample word vector, a knowledge point to which the sample word belongs is labeled, and the sample word vector, a positive example and a negative example are input through the input layer 11, wherein the positive example is a vector of words belonging to the same knowledge point as the sample word, and the negative example is a vector of words belonging to different knowledge points as the sample word. The sample word vector, after passing through the bi-directional LSTM layer 12 and the intent layer 13, obtains a first feature vector. The loss function of the first feature extraction model 1 is:

L＝max{0,M-cosine(q,a ₊ )+cosine(q,a _- )}

wherein M is a preset value, and the difference between the first feature vector and the positive example and the negative example is calculated through the loss function, and the parameters are updated so that the first feature of the sample word vector is more similar to the positive example (i.e., the difference is small) and is more different from the negative example (i.e., the difference is large).

It can be seen that the first feature extracted by the trained first feature extraction module 1 can characterize the knowledge point to which the object input to the module belongs.

The second feature extraction module 2 is configured to extract the aforementioned second features. Specifically, the second feature extraction module 2 includes: an input layer 21, an encoding layer 22, a decoding layer 23 and an output layer 24.

The input layer 21 is used for inputting sample word vectors.

The internal neurons of the encoding layer 22 and decoding layer 23 may each be a bi-directional LSTM network. As shown in fig. 3, the encoding layer 22 is used to convert the sample word vector into a vector C, and the decoding layer 23 is used to translate the vector C into a second feature vector.

The training process of the second feature extraction model 2 is: by inputting the sample word vector and answer information of the sample word (an example of a history context object) through the input layer 21, the encoding layer 22 calculates a vector C from the sample word vector, the decoding layer translates the vector C into a second feature vector, the loss function is a cross entropy of the second feature vector and the answer information output by the decoding layer, and parameters of the second feature extraction model are adjusted so that the cross entropy is as small as possible.

It can be seen that the second features extracted by the trained second feature extraction module 2 are able to characterize the context information of the object input to the module.

The third feature extraction model 3 is used to extract the aforementioned third features. Specifically, the third feature extraction model 3 includes: an input layer 31, a bi-directional LSTM layer 32, an attention layer 33 and an output layer 34.

The third feature extraction model 3 differs from the first feature extraction module 1 only in that: the input layer 31 is used for inputting a superposition vector, wherein the superposition vector is a vector formed by superposition of a sample word vector and a vector of a paraphrasing of the sample word. For example, the "pay" in the "i am to pay" ambiguities of the "pay" sample words are "pay", and then the "pay" vector and the "pay" vector are superimposed (e.g., averaged) to obtain a superimposed vector. Alternatively, the superimposed vector may be a vector obtained by superimposing a sample word vector, a vector of a paranym of the sample word, and a vector of interpretation information of the sample word, where the interpretation information of the sample word is information for interpreting the sample word, for example, interpretation of the sample word "call cost" is "call cost", the "call cost" is interpretation information, and the vector of the interpretation information is a vector obtained by converting the "call cost".

Further, another way to obtain the superposition vector is to fuse the sample word, the interpretation information and the paraphrasing word: analyzing information of each sample word is collected, the analyzing information and each sample word form a data set, word vectors are trained on each word in the data set, for example, word2vector mode is used for training word vectors, and then vectors of near-meaning words are overlapped to obtain overlapped vectors. The word vector obtained by fusion analysis information training has richer information.

The training process of the third feature extraction model 3 is: the superimposed vector, the positive example, and the negative example are input through the input layer 31, and the positive example and the negative example are the same as those input by the input layer 11 of the first feature extraction model 1.

The bidirectional LSTM layer 32 is configured to operate on the superimposed vectors to establish a timing relationship between the superimposed vectors, and the attention layer 33 operates on an output result of the bidirectional LSTM layer 32 to obtain a third feature vector. The parameters are updated using the same loss function as the loss function of the first feature extraction model 1 so that the third feature vector of the superimposed vector is more similar to its positive example and more different from the negative example.

It can be seen that the third feature extracted by the trained third feature extraction module 3 is capable of characterizing interpretation information and/or paraphrasing information of the object entered into the module.

The feature stitching module 4 is configured to stitch the first feature, the second feature, and the third feature to obtain a stitching vector.

The semantic extraction module 5 includes: an input layer 51, a bi-directional LSTM layer 52, an attention layer 53 and an output layer 54.

The input layer 51 is used for inputting a sample word vector and a concatenation vector. The sample word vector and the concatenation vector pass through the bidirectional LSTM layer 52 and the intent layer 53 in order, and then a semantic representation vector is obtained. The output layer 53 is for outputting the semantic representation vector.

The training process of the semantic extraction module 5 is as follows: the sample word vector, the splice vector of the sample word, the positive example and the negative example are input into a semantic extraction module 5, the sample word vector and the splice vector are subjected to a bidirectional LSTM layer 52 and an attribute layer 53 to obtain a semantic representation vector, the same loss function is used with the first feature extraction model, the difference between the semantic representation vector and the positive example and the negative example is calculated, and parameters are adjusted so that the semantic representation vector is more similar to the positive example and is different from the negative example.

In the training process, the first feature extraction module 1, the second feature extraction module 2 and the third feature extraction module 3 are trained respectively, and then the first feature extraction module 1, the second feature extraction module 2 and the third feature extraction module 3 are used to train the semantic extraction module 5.

From the structure and training process of the semantic extraction model, it can be seen that the first feature extraction module 1 obtains features using knowledge points to which words belong as priori information, the second feature extraction module 2 obtains features using context information of words as priori information, and the third feature extraction module 3 obtains features using knowledge points, near-meaning words and/or analysis information to which words belong as priori information, so that semantic expression vectors are obtained by taking the priori information of multiple dimensions as consideration, so that the semantic expression vectors have richer and hierarchical information expression, and meaning of words and sentences can be expressed more accurately.

The semantic extraction process shown in fig. 1 will be described in more detail below based on the semantic extraction model shown in fig. 2:

fig. 4 is a schematic diagram of another semantic extraction method according to an embodiment of the present application, including the following steps:

s401: and acquiring the statement to be understood, and preprocessing the statement to be understood.

For facilitating subsequent calculation, in this embodiment, the length of the inputted sentence to be understood may be set to a fixed value, for example, 30 words, and for a sentence with an insufficient length, zero is used for filling.

In this embodiment, preprocessing includes word segmentation and vectorization.

As is known in the art, in human thinking understanding, the information contained in the word can be considered to be larger than the word, so in this embodiment, the sentence to be understood is segmented to expect better information expression. For example, the sentence to be understood is "help me check how much me is arreared now", and the word segmentation result is "help/me/check/that/me/present/arreared/how much". Specific word segmentation algorithms can be found in the prior art, and are not described in detail herein.

Vectorization refers to converting a statement to be understood into a vector. In this embodiment, the statement to be understood may be vectorized using the existing labeling vector representation.

The pretreatment result obtained after word segmentation and vectorization is as follows: the vector of each word obtained by word segmentation is hereinafter referred to as word segmentation vector.

S402: and obtaining the superposition vector of the segmentation.

The process of obtaining the superposition vector may be referred to above, and will not be described here again.

S403: and inputting the word segmentation vector and the superposition vector into a semantic extraction model to obtain the semantic of the sentence to be understood output by the semantic extraction model.

Specifically, the word segmentation vector is input into the first feature extraction module 1, the second feature extraction module 2 and the semantic extraction module 5 respectively. The superimposed vector is input to the third feature extraction module 3.

The first feature extraction module 1 outputs a first feature vector Va according to the word segmentation vector, the second feature extraction module 2 outputs a second feature vector Vc according to the word segmentation vector, and the third feature extraction module 3 outputs a third feature vector Vd according to the superposition vector. The feature stitching module 4 stitches Va, vc and Vd, and outputs a stitching vector. The semantic extraction module 5 obtains a semantic representation vector according to the word segmentation vector and the splicing vector. The semantic extraction method shown in fig. 4 adopts a mode of combining prior information with word vectors to better represent semantic information, thereby improving the accuracy of text semantic understanding. Furthermore, the semantic extraction model has a mode of splicing the non-supervision characteristic information in a supervision way, so that semantic information representation is enriched.

In fig. 4, the sentence to be understood is executed in the input model, and is converted into the word segmentation vector through preprocessing, in addition, alternatively, a vector conversion module may be integrally provided in the model shown in fig. 2, in which case, the word segmentation result (or any word) of the sentence to be understood may be directly input into the model.

It should be noted that, in the above flow, the first feature, the second feature and the third feature are extracted as examples, and in addition, optionally, at least one of the first feature, the second feature and the third feature may be selected, that is, at least one of the first feature, the second feature and the third feature is extracted, and the semantics of the object to be understood is determined according to the extracted features. Of course, the more features are extracted, the more accurately the resulting semantics can express the object to be understood.

Fig. 5 is a schematic diagram of a semantic extraction device according to an embodiment of the present application, including: the device comprises an acquisition module, a feature extraction module and a semantic determination module.

The acquisition module is used for acquiring the object to be understood. The feature extraction module is used for extracting features of the object to be understood, wherein the features comprise at least one of the following: a first feature characterizing knowledge points to which the object to be understood belongs, a second feature characterizing historical context information of the object to be understood, a third feature characterizing interpretation information and/or paraphrasing information of the object to be understood. The semantic determining module is used for determining the semantics of the object to be understood according to the features.

Specifically, the feature extraction module includes at least one of a first feature extraction module, a second feature extraction module, and a third feature extraction module.

Further, the first feature extraction module is configured to obtain a first feature according to the inputted word segmentation vector. The first feature extraction module is obtained by training in advance by using a sample object marked with a knowledge point, a positive example which belongs to the same knowledge point as the sample object and a negative example which belongs to a different knowledge point as the sample object.

The second feature extraction module is used for obtaining second features according to the input word segmentation vector. The second feature extraction module is trained in advance using the sample object and a historical context object of the sample object.

The third feature extraction module is used for obtaining a third feature according to the input superposition vector. The third feature extraction module is obtained by training in advance by using the superposition vector of the sample object, the positive examples of the knowledge points which belong to the same knowledge point as the sample object and the negative examples of the knowledge points which belong to different knowledge points as the sample object.

The word segmentation vector is a vector generated according to an object to be understood.

Further, the process of obtaining the superposition vector includes: the object to be understood and the interpretation information of the object to be understood constitute a dataset. Words in the dataset are converted into word vectors. And superposing the vector of the object to be understood and the vector of the paraphrasing of the object to be understood to obtain a superposed vector. The superposition vector may be obtained by the third feature extraction module in the above manner, or the semantic extraction device shown in fig. 5 may further include a preprocessing module, and the superposition vector may be obtained by the preprocessing module in the above manner and input into the third feature extraction module.

The semantic determining module determines the specific implementation mode of the semantic of the object to be understood according to the characteristics as follows: and receiving the word segmentation vector and the extracted features, and obtaining the semantics of the object to be understood. The semantic extraction module is obtained by training a sample object, characteristics of the sample object, a positive example of the same knowledge point as the sample object and a negative example of the different knowledge point as the sample object in advance.

It should be noted that, in the apparatus described in this embodiment, the feature extraction module includes at least one of 1, 2, and 3 in the model shown in fig. 2, and the semantic determination module is 4 (or may not include 4) and 5 in the model shown in fig. 2.

The semantic extraction device shown in fig. 5 determines the semantics of the object to be understood according to the features of different dimensions, and has higher semantic understanding accuracy.

The embodiment of the application also discloses semantic extraction equipment, which comprises: memory and a processor. The memory is used to store one or more programs. The processor is configured to execute one or more programs to cause the semantic extraction device to implement the semantic extraction method described above.

The embodiment of the application also discloses a computer readable storage medium, wherein the computer readable storage medium stores instructions which, when run on a computer, cause the computer to execute the semantic extraction method.

The functions described in the methods of the present application, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computing device readable storage medium. Based on such understanding, a portion of the embodiments of the present application that contributes to the prior art or a portion of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing a computing device (which may be a personal computer, a server, a mobile computing device or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, randomAccess Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A semantic extraction method, comprising:

acquiring an object to be understood;

determining the semantics of the object to be understood according to the characteristics;

the method for extracting the first feature comprises the following steps:

the first feature extraction module is trained by using a sample object marked with a knowledge point, a positive example of the same knowledge point as the sample object and a negative example of different knowledge points as the sample object in advance;

the method for extracting the second feature comprises the following steps:

the word segmentation vector is a vector generated according to the object to be understood, and the second feature extraction module is obtained by training a sample object and a historical context object of the sample object in advance;

the method for extracting the third feature comprises the following steps:

2. The method of claim 1, wherein the process of obtaining the superposition vector comprises:

converting words in the dataset into word vectors;

3. The method according to any of claims 1-2, wherein said determining the semantics of the object to be understood from the features comprises:

4. A semantic extraction device, comprising:

the acquisition module is used for acquiring an object to be understood;

the semantic determining module is used for determining the semantics of the object to be understood according to the characteristics;

the feature extraction module includes:

5. A semantic extraction device, comprising:

a memory and a processor;

the memory is used for storing one or more programs;

the processor is configured to execute the one or more programs to cause the semantic extraction device to implement the semantic extraction method of any one of claims 1-3.

6. A computer readable medium, characterized in that the computer readable storage medium has stored therein instructions, which when run on a computer, cause the computer to perform the semantic extraction method according to any of claims 1-3.