WO2022057406A1 - 一种基于神经网络的自然语言处理方法和电子设备 - Google Patents
一种基于神经网络的自然语言处理方法和电子设备 Download PDFInfo
- Publication number
- WO2022057406A1 WO2022057406A1 PCT/CN2021/105268 CN2021105268W WO2022057406A1 WO 2022057406 A1 WO2022057406 A1 WO 2022057406A1 CN 2021105268 W CN2021105268 W CN 2021105268W WO 2022057406 A1 WO2022057406 A1 WO 2022057406A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- matrix
- convolution
- word vector
- input sentence
- vector matrix
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 56
- 238000003058 natural language processing Methods 0.000 title claims abstract description 44
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 40
- 239000011159 matrix material Substances 0.000 claims abstract description 243
- 238000004364 calculation method Methods 0.000 claims abstract description 91
- 238000013507 mapping Methods 0.000 claims description 58
- 238000011176 pooling Methods 0.000 claims description 33
- 230000009467 reduction Effects 0.000 claims description 32
- 230000004913 activation Effects 0.000 claims description 18
- 238000004590 computer program Methods 0.000 claims description 18
- 230000006870 function Effects 0.000 claims description 17
- 230000009466 transformation Effects 0.000 claims description 11
- 230000003213 activating effect Effects 0.000 claims description 8
- 239000002699 waste material Substances 0.000 abstract description 6
- 238000012545 processing Methods 0.000 description 12
- 230000008569 process Effects 0.000 description 11
- 238000010586 diagram Methods 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 239000000872 buffer Substances 0.000 description 2
- 230000003139 buffering effect Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 230000005012 migration Effects 0.000 description 2
- 238000013508 migration Methods 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 239000011800 void material Substances 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
- G06F40/35—Discourse or dialogue representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Definitions
- the invention relates to the technical field of artificial intelligence, in particular to a natural language processing method and electronic device based on a neural network.
- Neural networks have been widely used in the field of natural language processing.
- the essence of convolution is the process of extracting features from input data with a kernel function, and the output of convolution is the extracted features (represented by a mapping matrix).
- the input data of the convolution can be in NHWC format, that is, [batch, in_height, in_width, in_channels], including four dimensions of batch (batch), height (height), width (width), and channels (channel), where the height dimension represents Enter the length of the data.
- the height, width and number of input channels of the convolutional input data are all designed to be fixed values. For example, if the input data is a user sentence, regardless of the actual length of the input user sentence, the height of the input data is set to a fixed maximum value (such as 60). If the actual number of words in the user sentence is less than 60, then The specific value is used to fill up to 60 words later, and it is still calculated according to 60 words when the convolution calculation is performed.
- the length of user sentences that is, user speech content
- Convolution calculation using the value will cause a waste of computing resources and time.
- the present invention is proposed to provide a neural network-based natural language processing method and electronic device that overcome the above problems or at least partially solve the above problems.
- An object of the present invention is to provide a neural network-based natural language processing method that improves the computing speed by using variable-length convolutions.
- a further object of the present invention is to further increase the computational speed by fusing the convolution, activation and pooling steps to be performed in the same accelerator.
- a neural network-based natural language processing method including:
- a convolution calculation is performed on the word vector matrix by using a convolution kernel to obtain a feature map matrix, wherein, in the convolution calculation, the word vector matrix and the volume are executed for corresponding times according to the length information of the input sentence. Multiplication between product kernels;
- the probability values of all the classification labels in the classification label matrix are normalized to obtain the intent recognition result of the input sentence.
- the length information is a mask of Boolean pointer type.
- the multiplication operation between the word vector matrix and the convolution kernel is performed for corresponding times according to the length information of the input sentence, including:
- the convolution kernel is made to slide on the height dimension of the word vector matrix to perform the corresponding number of times between the word vector matrix and the convolution kernel.
- the multiplication operation of wherein the sliding stroke of the convolution kernel is equal to the number of words of the input sentence indicated by the length information of the input sentence, and the corresponding number of times is equal to the number of words indicated by the length information of the input sentence A value obtained by adding 1 to the difference between the number of words in the input sentence and the size of the convolution kernel.
- the multiplication operation between the word vector matrix and the convolution kernel is performed for corresponding times according to the length information of the input sentence, including:
- the convolution kernel is made to slide on the height dimension of the word vector matrix to execute the word vector matrix and the corresponding number of times.
- the multiplication operation between the convolution kernels, wherein, the sliding stroke of the convolution kernel on the height dimension of the word vector matrix is equal to the number of words of the input sentence indicated by the length information of the input sentence plus.
- the size of the above-mentioned convolution kernel is reduced by 1; the corresponding number of times is equal to the number of words of the input sentence indicated by the length information of the input sentence.
- the number of the convolution kernels is multiple, and the sizes of the convolution kernels are different; the number of the feature mapping matrices is multiple, and the multiple feature mapping matrices are the same as the multiple
- the convolution kernels are in one-to-one correspondence; the number of the dimensionality reduction feature mapping matrices is multiple, and the multiple dimensionality reduction feature mapping matrices are in one-to-one correspondence with the multiple feature mapping matrices;
- Described adopting the convolution kernel to carry out convolution calculation on the word vector matrix to obtain a feature mapping matrix including:
- Activating and pooling the feature map matrix to obtain a dimensionality reduction feature map matrix including:
- the intention recognition based on the dimensionality reduction feature mapping matrix to obtain the classification label matrix of the input sentence including:
- Intention recognition is performed based on the merged feature mapping matrix, and a classification label matrix of the input sentence is obtained.
- the method before using a convolution kernel to perform convolution calculation on the word vector matrix to obtain a feature mapping matrix, the method further includes:
- a one-dimensional channel dimension is added to the word vector matrix to expand the dimension of the word vector matrix to four dimensions including batch, height, width and channel;
- the length of each dimension of the word vector matrix after dimension expansion is transformed, so that each dimension of the transformed word vector matrix is The lengths of the dimensions do not exceed the corresponding maximum limit, wherein the product of the lengths of the dimensions of the word vector matrix after transformation is equal to the product of the lengths of the dimensions of the word vector matrix before the transformation.
- the feature map matrix is activated, including:
- the feature map matrix is nonlinearly mapped using the relu activation function.
- pooling the feature map matrix includes:
- Average pooling or max pooling is performed on the feature map matrix.
- an electronic device including:
- the computer program code When executed by the processor, the computer program code causes the electronic device to execute the neural network-based natural language processing method according to any one of the above.
- a computer-readable storage medium is also provided, where the computer-readable storage medium is used to store a computer program, and the computer program is used to implement the above-mentioned neural network-based natural language Approach.
- a chip for running instructions includes a memory and a processor, the memory stores codes and data, the memory is coupled with the processor, and the memory is coupled to the processor.
- the processor runs the code in the memory so that the chip is used to execute the above-mentioned neural network-based natural language processing method.
- a program product including instructions, when the program product runs on a computer, the computer enables the computer to execute the above-mentioned neural network-based natural language processing method.
- a computer program when the computer program is executed by a processor, for executing the above-mentioned natural language processing method based on a neural network.
- the natural language processing method based on the neural network proposed in the embodiment of the present invention by introducing the length information of the input sentence reflecting the actual length of the input sentence, in the convolution calculation, according to the length information of the input sentence, the corresponding word vector matrix and the The multiplication operation between the convolution kernels transforms the fixed-length convolution based on the fixed preset sentence length of the prior art into a variable-length convolution based on the actual length of the input sentence, which greatly reduces the amount of calculation, thereby significantly improving the The computing speed reduces the waste of computing resources and time.
- edge padding is performed at the end of the input sentence to fully extract the features of the last word of the input sentence, thereby improving the recognition processing accuracy of the input sentence.
- Fig. 1 shows a typical neural network graph structure diagram of intention recognition in natural language processing in the prior art
- FIG. 2 shows a schematic flowchart of a natural language processing method based on a neural network according to an embodiment of the present invention
- FIG. 3 shows a graph structure diagram of a neural network in natural language processing according to an embodiment of the present invention
- FIG. 4 shows a schematic diagram of the glide of the convolution kernel when the tail of the input sentence is filled with edges according to an embodiment of the present invention.
- Convolution is widely used in neural networks.
- the convolution calculation is essentially a multiply-accumulate process.
- the input data of the convolution is in NHWC format, that is, [batch, in_height, in_width, in_channels], where batch represents the number of a batch of processing objects participating in the convolution calculation (for example, if the processing object is an image, it means that the processing objects participate in the convolution calculation.
- the number of a batch of images if the processing object is a sentence, it indicates the number of a batch of sentences participating in the convolution calculation), in_height indicates the height of the input data, in_width indicates the width of the input data, and in_channels indicates the number of channels of the input data.
- the convolution kernel of the convolution is in HWCN format, namely [filter_height, filter_width, in_channels, out_channels], where filter_height represents the height of the convolution kernel, filter_width represents the width of the convolution kernel, in_channels represents the number of channels of the input data, and out_channels represents the output data number of channels.
- the output data of the convolution is in the NHWN format, namely [batch, output_height, output_width, out_channels], where batch represents the number of a batch of processing objects participating in the convolution calculation, output_height represents the height of the output data, output_width represents the width of the output data, and out_channels Indicates the number of channels of output data.
- the change rules of height and width are as follows:
- Padding refers to the value of the padding type of the excess part if the convolution kernel slides on the edge of the input data, if it exceeds the edge of the input data.
- padding defaults to VALID (that is, valid), and the value is 0, then the calculation formula of the height and width of the convolution output data can be uniformly transformed into the following formula:
- input_size is the size of the input data, specifically the height or width of the input data.
- kernel_size is the size of the convolution kernel. Specifically, when input_size is the height of the input data, kernel_size is the height of the convolution kernel. When input_size is the width of the input data, kernel_size is the width of the convolution kernel. stride refers to the convolution step size, that is, the value of each movement of the convolution kernel.
- Padding can also be SAME, the height and width of the convolution output data are the same as the input.
- FIG. 1 is a typical neural network graph structure diagram for intention recognition in natural language processing in the prior art.
- the typical neural network uses 2-dimensional matrix convolution, and the convolution function can be defined in the following format: void conv2d(int8*input,int*inputShape,int8*filter,int*filterShape,int8*output,int*outputShape), where, input is the input data pointer, inputShape is the input data dimension, filter is the convolution kernel data pointer, filterShape is the convolution kernel dimension, output is the output data pointer, outputShape is the output data dimension, and int8 indicates that the data type is an 8-bit integer. type data, int indicates that the data type is integer data.
- the height, width and number of input channels of the convolutional input data are all designed to be fixed values, wherein the height of the convolutional input data is equal to the value to be recognized.
- the length of the sentence that is, the length of the sentence to be recognized is designed to be a fixed length (maximum sentence length).
- the sentence length is designed to be a fixed length of 60 as an example, no matter what the actual length of the input user sentence is, it will be calculated as 60 words (represented by a [1,60] matrix).
- the word vector matrix [1, 60, 8, 4] is obtained as the input data of the convolution, where the height of the word vector matrix is equal to 60 (that is, the fixed length of the sentence).
- the convolution kernels W[3,8,4,128], W[4,8,4,128] and W[5,8,4,128] are used for convolution calculation.
- the padding is VALID and the convolution step size is
- the stride value is 1
- the above three convolution kernels are used to multiply and accumulate the input word vector matrix respectively to obtain the following output feature mapping matrix: [1,58,1,128], [1,57,1,128], [1, 56, 1, 128], where the height of the feature map matrix (specifically, 58, 57, and 56 in this example) represents the number of multiplication operations between each convolution kernel and the input word vector matrix.
- the loop control of the convolution calculation takes the following form:
- each feature map matrix is sequentially activated and pooled.
- the dimensions of the output data and the input data in the activation step remain unchanged.
- the above three dimensionality reduction feature map matrices [1, 1, 1, 128], [1, 1, 1, 128], [1, 1, 1, 128] are accumulated and combined to obtain the combined feature map matrix [1, 1, 1, 384] .
- the intent recognition of the sentence is performed based on the dimensionality reduction feature map matrix [1, 1, 1, 384] through the fully connected layer, and the classification label matrix [1, 50] of the sentence is obtained.
- the probability values of all the classification labels in the classification label matrix [1, 50] are normalized by the softmax layer, and the intent recognition result matrix [1, 50] is obtained.
- FIG. 2 shows a schematic flowchart of a natural language processing method based on a neural network according to an embodiment of the present invention.
- the method may at least include the following steps S102 to S112.
- Step S102 receiving the input natural sentence as an input sentence, and generating length information of the input sentence according to the number of words in the input sentence.
- Step S104 Determine the index of each word in the input sentence, find the vector value of each word from the word vector table according to the index of each word, and obtain a word vector matrix of the input sentence.
- Step S106 use the convolution kernel to perform convolution calculation on the word vector matrix to obtain a feature map matrix, wherein, in the convolution calculation, perform the multiplication operation between the word vector matrix and the convolution kernel for the corresponding number of times according to the length information of the input sentence. .
- step S108 the feature map matrix is activated and pooled to obtain a dimensionality reduction feature map matrix.
- Step S110 performing intention recognition based on the dimensionality reduction feature mapping matrix, and obtaining a classification label matrix of the input sentence, where the classification label matrix includes the probability value of each classification label.
- Step S112 normalize the probability values of all the classification labels in the classification label matrix to obtain the intent recognition result of the input sentence.
- the natural language processing method based on the neural network proposed by the embodiment of the present invention by introducing the length information of the input sentence reflecting the actual length of the input sentence, in the convolution calculation, according to the length information of the input sentence, the corresponding word vector matrix and the The multiplication operation between the convolution kernels transforms the fixed-length convolution based on the fixed preset sentence length of the prior art into a variable-length convolution based on the actual length of the input sentence, which greatly reduces the amount of calculation, thereby significantly improving the The computing speed reduces the waste of computing resources and time.
- FIG. 3 shows a graph structure diagram of a neural network in natural language processing according to an embodiment of the present invention. The steps of the embodiment of the present invention are described below with reference to FIG. 3 .
- the input sentence is a natural sentence of the user, and the length information of the input sentence is generated according to the number of words of the input sentence (ie, the actual length of the input sentence).
- the length information of the input statement may be a boolean pointer (bool) type mask (may be referred to as a length mask).
- the mask is a bool type array, each bool value can be true or false, true means the bit is valid, false means the bit is invalid.
- the length information (specifically, the length mask) of the generated input sentence contains 60 bool values, among which The first 10 bool values are true, and the rest of the bool values are false.
- step S104 the word vector matrix of the input sentence is obtained through the word embedding look-up table.
- each character of the input sentence is replaced by an index (specifically an index value), followed by a specific value (for example, 0)
- index specifically an index value
- a specific value for example, 0
- 1 means that the batch is 1, that is, one sentence is processed at a time.
- the batch is not 1. For example, if three sentences are processed at a time, the batch is 3, and 3 ⁇ 60 integer values are obtained.
- the vector value of each word is found from the trained word vector table (the word vector table includes the mapping relationship between the word index and the corresponding word vector), and the word vector matrix of the input sentence is obtained.
- the vector value of each word is a 32-bit floating point number.
- the vector values corresponding to the index values of the 10 words actually included in the input sentence are included in the first position, and the latter are still supplemented with a specific value (for example, 0).
- the first 10 data in the height dimension are data corresponding to the 10 words actually included in the input sentence, and the last 50 data are data supplemented with specific values.
- the word vector matrix After obtaining the word vector matrix of the input sentence, the word vector matrix can be convolved to extract features.
- the word vector matrix in order to improve the efficiency of the convolution calculation, before performing the convolution calculation on the word vector matrix, the word vector matrix may also be dimensionally transformed.
- the channel of natural language input data can be regarded as having only one channel.
- a one-dimensional channel dimension can be added to the word vector matrix to expand the dimension of the word vector matrix to four dimensions including batch, height, width, and channel. For example, for the word vector matrix [1,60,32] obtained in the previous step, the one-dimensional channel dimension can be increased to make it [1,60,32,1].
- the length of each dimension of the input data matrix for convolution calculation is limited. Calculated in steps, reducing efficiency. Therefore, after the dimension of the word vector matrix is expanded, the length of each dimension of the dimension-expanded word vector matrix can also be transformed according to the maximum limit of the length of each dimension of the input data matrix calculated by convolution, so that The length of each dimension of the transformed word vector matrix does not exceed the corresponding maximum limit, wherein the product of the lengths of each dimension of the transformed word vector matrix is equal to the product of the lengths of each dimension of the untransformed word vector matrix.
- the word vector matrix [1, 60, 8, 4] after dimension transformation is obtained as the input data of the convolution calculation.
- step S106 the convolution calculation is performed on the word vector matrix using the convolution check to extract features, and a feature mapping matrix is obtained.
- the length information of the input sentence generated in step S102 is introduced into the convolution calculation, so as to perform the multiplication operation between the word vector matrix and the convolution kernel for corresponding times according to the length information of the input sentence.
- the feature map matrix calculated in this way contains the feature data corresponding to the corresponding number of times extracted from the input sentence at the front in the height dimension, and the other data in the latter are supplemented with a specific value (for example, 0).
- the length information of the input sentence is introduced into the convolution calculation by modifying the existing convolution function (specifically, adding a length variable reflecting the length information of the input sentence on the basis of the existing convolution function).
- the modified convolution function can be defined in the following format: void conv2d(int8*input,int*inputShape,int8*filter,int*filterShape,int8*output,int*outputShape,bool*mask) , where input is the input data pointer, inputShape is the input data dimension, filter is the convolution kernel data pointer, filterShape is the convolution kernel dimension, output is the output data pointer, outputShape is the output data dimension, and mask is the length variable, which can be A length mask of boolean pointer type for the input data.
- the loop control of the convolution calculation becomes the following form:
- variable-length convolutions can be used wherever neural networks are needed.
- the length information (specifically, the length mask) of the input sentence generated in step S102 is input into the modified convolution function to participate in the convolution calculation.
- the word vector matrix as the input data of the convolution calculation includes the vector values corresponding to the index values of the actual number of words contained in the input sentence, and the other following data are supplemented with specific values.
- the length information of the input sentence is input into the modified convolution function to control the internal calculation cycle of the convolution, so that when the convolution calculation is performed, only the index value of the actual number of words contained in the word vector matrix and the input sentence is performed.
- the corresponding vector-valued data is processed, but the specific-valued data complemented in the word vector matrix is not processed.
- the number of multiplication operations between the word vector matrix and the convolution kernel can be calculated according to the following formula (1):
- multiplication_number indicates the number of multiplication operations between the word vector matrix and the convolution kernel
- slide_size indicates the sliding stroke of the convolution kernel in the height dimension of the word vector matrix
- kernel_size indicates the size of the convolution kernel (ie, the convolution kernel height)
- stride represents the convolution stride.
- the step of performing the multiplication operation between the word vector matrix and the convolution kernel for a corresponding number of times according to the length information of the input sentence can be implemented as follows:
- the convolution kernel is made to slide on the height dimension of the word vector matrix, and the corresponding number of multiplication operations between the word vector matrix and the convolution kernel is equal to the input sentence.
- the first 10 data in the height dimension of the input word vector matrix [1, 60, 8, 4] are calculated with the convolution
- the data corresponding to the 10 words actually included in the input sentence, and the last 50 data are the data supplemented with a specific value.
- a convolution kernel of size 3 (W[3,8,4,128] in this example is used, where the length 3 in the height dimension is the size of the convolution kernel) is used for convolution calculation, and the convolution step size is set to 1, then let the convolution kernel W[3,8,4,128] slide on the height dimension of the word vector matrix [1,60,8,4], and the sliding stroke is equal to 10 to perform the corresponding number of word vector matrix and convolution Multiplication between kernels to get the feature map matrix [1, 58, 1, 128].
- the calculated feature map matrix [1,58,1,128] contains the height dimension
- the first 8 sets of feature data extracted from the input sentence, and the last 50 sets of data are supplemented with specific values.
- the number of convolutions of the input sentence is greatly reduced.
- Words with specific meanings are usually composed of a specific number of characters in a sentence, and different words/words can also represent different meanings by their sequences and spatial positions. For example, in Chinese, words are usually composed of 2 or 3 characters, and 4 characters are composed of idioms. Words and idioms can represent a specific meaning. Therefore, when extracting features by convolution, convolution kernels of different sizes can be used for convolution calculation. In this case, the number of convolution kernels used in step S106 may be multiple, and the sizes of the convolution kernels are different. The number of feature mapping matrices obtained by convolution may also be multiple, and multiple feature mapping matrices are in one-to-one correspondence with multiple convolution kernels. Specifically, step S106 is further implemented correspondingly as follows: using multiple convolution check word vector matrices of different sizes to perform convolution calculation respectively to obtain multiple feature mapping matrices.
- the format of the convolution kernel is HWCN, namely [filter_height (height of the convolution kernel), filter_width (the width of the convolution kernel), in_channels (number of input channels), out_channels (number of output channels)], where the number of input channels must be the same as
- the number of input channels of the input data ie, the word vector matrix
- the number of output channels can be set manually, generally 128 to 300 are more appropriate.
- the dimension of the input sentence is relatively small, and it is preferable to set the number of output channels of the convolution kernel to 128.
- the word vector matrix [1, 60, 8, 4] after the dimension transformation described above is used as the input data of the convolution calculation
- the padding value is 0 (that is, the edge is not filled)
- the product stride value is 1, and the convolution kernels W[3,8,4,128], W[4,8,4,128], W[5,8,4,128] are used to pair the word vector matrix [1,60,8 ,4]
- the feature map matrices output by the convolution calculation are [1,58,1,128], [1,57,1,128], [1,56,1,128] respectively.
- edge padding is not performed during the convolution calculation.
- the calculation operation is simplified and the amount of calculation is reduced, the feature of the last word of the input sentence cannot be fully extracted because the edge padding is not performed, which may cause recognition problems. Some loss of accuracy. Therefore, in another embodiment, when the convolution is used to check the word vector matrix for convolution calculation, edge padding can also be performed at the end of the input sentence to obtain a feature map matrix, so that the last word of the input sentence can be fully extracted. feature to improve the recognition and processing accuracy of input sentences.
- the step of performing the multiplication operation between the word vector matrix and the convolution kernel for corresponding times according to the length information of the input sentence can be implemented as follows:
- the tail is filled with edges and the convolution step size is 1, so that the convolution kernel slides on the height dimension of the word vector matrix to perform the multiplication operation between the word vector matrix and the convolution kernel for the corresponding number of times.
- multiplication_number ((input_number+kernel_size-1)-kernel_size)/stride+1
- the sliding stroke of the convolution kernel is equal to input_size+kernel_size-1.
- equation (3) can be simplified to equation (4):
- the convolution kernel is made to slide the number of words of the input sentence in the height dimension of the word vector matrix plus the stroke of the size of the convolution kernel minus 1 , the corresponding number of multiplication operations between the word vector matrix and the convolution kernel is equal to the number of words of the input sentence indicated by the length information of the input sentence.
- the convolution kernel of size 3 is on the height dimension of the word vector matrix of the input sentence.
- the sliding stroke is 5 (that is, from the beginning of the sentence "Guide” to the end of the sentence "Sea")
- the number of multiplication operations is 3 times.
- the features of the last word of the input sentence can be fully extracted with only a slight increase in the amount of calculation, thereby improving the recognition processing accuracy of the input sentence.
- the convolution kernels [3, 8, 4, 128] and [4] are used respectively.
- 8, 4, 128], [5, 8, 4, 128] perform convolution calculation on the word vector matrix [1, 60, 8, 4], and fill in the tail of the input sentence during the convolution calculation, so that each volume
- the sliding stroke of the product kernel in the height dimension of the word vector matrix is based on the number of words in the input sentence, increasing the size of the convolution kernel minus 1, and the stride value is 1, then the output feature map matrix is respectively are [1,58,1,128], [1,57,1,128], [1,56,1,128].
- the average length of a user sentence is about 14 words, compared with the prior art, for example, the fixed length of 60 words, the length only accounts for about a quarter of the original, thus, the present invention implements The method of the example can theoretically save three-quarters of the computing time.
- step S108 the feature map matrix obtained by convolution is activated and pooled to obtain a dimension reduction feature map matrix. Specifically, it can be divided into two steps: activation and pooling.
- the feature map matrix is activated through the activation function, which brings nonlinear characteristics to the neural network.
- the relu activation function can be used to perform nonlinear mapping on the feature mapping matrix.
- the characteristics of the Relu activation function should be well known to those skilled in the art, and will not be repeated herein.
- the dimensions of the output data and the input data in the activation step remain unchanged.
- pooling The purpose of the pooling step is to reduce the dimensionality of the extracted features in order to compress the data. Pooling is also known as downsampling, where the input is multiple values and the output is one value. Pooling usually includes average pooling, max pooling, etc. Average pooling refers to taking the average of multiple numbers of the input, and using the average value to replace the multiple numbers of the input as the output. Max pooling refers to taking the largest number from the input number as the output value.
- average pooling or maximum pooling may be performed on the feature map matrix in the pooling step.
- max pooling can be employed.
- the maximum pooling is performed on the height dimension of the feature mapping matrix, and the maximum value is selected as the output value among the values of its height dimension to obtain the dimension-reducing feature mapping matrix.
- step S106 a plurality of convolution kernels of different sizes are used to perform convolution calculation on the word vector matrix respectively, and in the case of obtaining a plurality of feature mapping matrices corresponding to the plurality of convolution kernels one-to-one, in step S108, The plurality of feature mapping matrices are activated and pooled respectively to obtain a plurality of dimensionality reduction feature mapping matrices, and the plurality of dimensionality reduction feature mapping matrices are in one-to-one correspondence with the plurality of feature mapping matrices.
- the way of activation and pooling here is the same as described above and will not be repeated.
- step S108 will be described in detail with reference to FIG. 3 .
- the maximum value is selected as the output from each value of the height dimension of the activated feature map matrix [1,58,1,128], [1,57,1,128], [1,56,1,128], and the dimensionality reduction feature map is obtained.
- the intent recognition of the input sentence can be performed based on the dimension reduction feature mapping matrix through the fully connected layer, and the classification label matrix of the input sentence can be obtained.
- the fully connected layer is generally used at the end of the neural network to fuse the feature maps obtained by convolution, and output the fused feature maps to the desired dimension, so that each output node has a feature map. Therefore, the fully connected layer has the function of feature fusion and dimension transformation.
- the input of the fully connected layer is a dimensionality reduction feature mapping matrix
- the output is a classification label matrix
- the dimension of the classification label matrix is the number of classification labels intended to be identified
- each classification label is represented by a numerical value.
- the probability of , the value is a floating-point number. For example, assuming that the number of intent recognition classification labels is 50, the classification label matrix output in this step is [1, 50], as shown in Figure 3.
- step S110 may be further implemented as follows: firstly, the multiple dimensionality reduction feature mapping matrices are accumulated and merged to obtain a merged feature mapping matrix. Then, intent recognition is performed based on the merged feature mapping matrix, and the classification label matrix of the input sentence is obtained.
- the probability values of all the classification labels in the classification label matrix can be normalized by the softmax layer to obtain the intent recognition result of the input sentence.
- the function of the Softmax layer is to normalize the classification label probabilities. It does not change the dimension of the input data, but only changes each floating point value of the input data, so that the sum of the floating point numbers of all classification label probabilities is equal to 1. Therefore,
- the output data of this step has the same dimension as the classification label matrix. For example, as shown in Figure 3, when the fully connected layer outputs a classification label matrix [1,50], the data output in this step is also a [1,50] matrix.
- the activation step described above is to process each value in the feature map matrix output by the convolution
- the pooling step is to process the feature map matrix output by the convolution as a whole.
- the inventors found that in the hardware acceleration processing of neural networks, most of the overhead is caused by data relocation, and at the same time, the data buffering in the calculation process also has spatial locality, which are bottlenecks that limit the calculation speed. Based on this finding, in a preferred embodiment, the steps of using convolution to check the word vector matrix for convolution calculation to obtain the feature map matrix (ie, step S106 ) and the activation and summation of the feature map matrix can be performed on the same accelerator.
- step S108 The step of pooling to obtain a dimensionality reduction feature map matrix (ie, step S108 ).
- step S106 and step S108 may be implemented in one chip. In this way, multiple processing of the same data can be integrated into one accelerator, reducing data relocation and frequent data update buffering, thereby further improving computing speed and reducing computing time.
- steps S104 and S106 ie, the self-embedding table lookup and convolution steps
- the lengths of the height dimension of the input data matrix and the output data matrix are still determined based on the fixed maximum sentence length. This ensures that the storage space overhead of these input data matrices and output data matrices is stable, avoids the problem of frequently allocating storage spaces of different sizes for input sentences of different actual lengths, and is more conducive to the effective use of space.
- an embodiment of the present invention also provides an electronic device.
- the electronic equipment includes:
- the computer program code When executed by the processor, it causes the electronic device to execute the neural network-based natural language processing method described in any one of the above embodiments or a combination thereof.
- Embodiments of the present invention further provide a computer-readable storage medium, where computer instructions are stored in the computer-readable storage medium, and when the computer instructions are executed on a computer, the computer can execute the above-mentioned neural network-based natural language processing method.
- An embodiment of the present invention further provides a chip for running instructions, the chip includes a memory and a processor, where code and data are stored in the memory, the memory is coupled with the processor, and the processor runs the memory in the memory
- the code makes the chip to perform the above-mentioned neural network-based natural language processing method.
- An embodiment of the present invention further provides a program product containing instructions, the program product includes a computer program, and the computer program is stored in a computer-readable storage medium, and at least one processor can read from the computer-readable storage medium
- the computer program when the at least one processor executes the computer program, can implement the above-mentioned natural language processing method based on a neural network.
- An embodiment of the present invention further provides a computer program, which is used to execute the above-mentioned neural network-based natural language processing method when the computer program is executed by a processor.
- the embodiments of the present invention can achieve the following beneficial effects:
- the natural language processing method based on the neural network proposed by the embodiment of the present invention by introducing the length information of the input sentence reflecting the actual length of the input sentence, in the convolution calculation, according to the length information of the input sentence, the corresponding word vector matrix and the The multiplication operation between the convolution kernels transforms the fixed-length convolution based on the fixed preset sentence length of the prior art into a variable-length convolution based on the actual length of the input sentence, which greatly reduces the amount of calculation, thereby significantly improving the The computing speed reduces the waste of computing resources and time.
- edge padding is performed at the end of the input sentence to fully extract the features of the last word of the input sentence, thereby improving the recognition processing accuracy of the input sentence.
- each functional unit in each embodiment of the present invention may be physically independent of each other, or two or more functional units may be integrated together, or all functional units may be integrated into one processing unit.
- the above-mentioned integrated functional units may be implemented in the form of hardware, and may also be implemented in the form of software or firmware.
- the integrated functional unit is implemented in the form of software and sold or used as an independent product, it can be stored in a computer-readable storage medium.
- the technical solution of the present invention or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, which includes several instructions to make a computer
- a computing device such as a personal computer, a server, or a network device, etc.
- the aforementioned storage medium includes: a U disk, a removable hard disk, a read only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk and other media that can store program codes.
- all or part of the steps of implementing the foregoing method embodiments may be accomplished by program instructions related to hardware (such as a personal computer, a server, or a computing device such as a network device), and the program instructions may be stored in a computer-readable storage
- the program instructions when executed by the processor of the computing device, the computing device executes all or part of the steps of the methods described in the embodiments of the present invention.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Machine Translation (AREA)
Abstract
Description
Claims (13)
- 一种基于神经网络的自然语言处理方法,其特征在于,包括:接收输入的自然语句作为输入语句,根据所述输入语句中的字数量生成所述输入语句的长度信息;确定所述输入语句中各字的索引,根据各所述字的索引从字向量表中查找到各所述字的向量值,得到所述输入语句的字向量矩阵;采用卷积核对所述字向量矩阵进行卷积计算,得到特征映射矩阵,其中,在所述卷积计算中,根据所述输入语句的长度信息执行相应次数的所述字向量矩阵与所述卷积核之间的乘法运算;对所述特征映射矩阵进行激活和池化,得到降维特征映射矩阵;基于所述降维特征映射矩阵进行意图识别,得到所述输入语句的分类标签矩阵,所述分类标签矩阵包含各分类标签的概率值;对所述分类标签矩阵中所有分类标签的概率值进行归一化,得到所述输入语句的意图识别结果。
- 根据权利要求1所述的自然语言处理方法,其特征在于,所述长度信息为布尔指针类型的掩码。
- 根据权利要求1所述的自然语言处理方法,其特征在于,在所述卷积计算中,根据所述输入语句的长度信息执行相应次数的所述字向量矩阵与所述卷积核之间的乘法运算,包括:以边缘不填充、卷积步长为1的方式,令所述卷积核在所述字向量矩阵的高度维度上滑行,以执行相应次数的所述字向量矩阵与所述卷积核之间的乘法运算,其中,所述卷积核的滑行行程等于所述输入语句的长度信息所指示的所述输入语句的字数量,所述相应次数等于所述输入语句的长度信息所指示的所述输入语句的字数量与所述卷积核的尺寸的差加上1得到的数值。
- 根据权利要求1所述的自然语言处理方法,其特征在于,在所述卷积计算中,根据所述输入语句的长度信息执行相应次数的所述字向量矩阵与所述卷积核之间的乘法运算,包括:以在所述输入语句的尾部进行边缘填充、卷积步长为1的方式,令所述卷积核在所述字向量矩阵的高度维度上滑行,以执行相应次数的所述字向量矩阵与所述卷积核之间的乘法运算,其中,所述卷积核在所述字向量矩阵的高度维度上的滑行行程等于所述输入语句的长度信息所指示的所述输入语句的字数量加上所述卷积核的尺寸减1;所述相应次数等于所述输入语句的长度信息所指示的所述输入语句的字数量。
- 根据权利要求1所述的自然语言处理方法,其特征在于,在同一个加速器上执行所述采用卷积核对所述字向量矩阵进行卷积计算,得到特征映射矩阵的步骤以及所述对所述特征映射矩阵进行激活和池化,得到降维特征映射矩阵的步骤。
- 根据权利要求1所述的自然语言处理方法,其特征在于,所述卷积核的数量为多个,且各所述卷积核的尺寸不同;所述特征映射矩阵的数量为多个,且多个所述特征映射矩阵与多个所述卷积核一一对应;所述降维特征映射矩阵的数量为多个,且多个所述降维特征映射矩阵与多个所述特征映射矩阵一一对应;所述采用卷积核对所述字向量矩阵进行卷积计算,得到特征映射矩阵,包括:采用不同尺寸的多个所述卷积核对所述字向量矩阵分别进行卷积计算,得到多个所述特征映射矩阵;对所述特征映射矩阵进行激活和池化,得到降维特征映射矩阵,包括:对多个所述特征映射矩阵分别进行激活和池化,得到多个所述降维特征映射矩阵;所述基于所述降维特征映射矩阵进行意图识别,得到所述输入语句的分类标签矩阵,包括:对多个所述降维特征映射矩阵进行累加合并,得到合并特征映射矩阵;基于所述合并特征映射矩阵进行意图识别,得到所述输入语句的分类标签矩阵。
- 根据权利要求1所述的自然语言处理方法,其特征在于,在采用卷积 核对所述字向量矩阵进行卷积计算,得到特征映射矩阵之前,还包括:对所述字向量矩阵增加一维通道维度,以将所述字向量矩阵的维度扩展为包括批次、高度、宽度和通道的四个维度;根据所述卷积计算的输入数据矩阵的各维度的长度的最大限值,对维度扩展后的所述字向量矩阵的各维度的长度进行变换,以使变换后的所述字向量矩阵的各维度的长度均不超过对应的最大限值,其中,变换后的所述字向量矩阵的各维度的长度的乘积等于变换前的所述字向量矩阵的各维度的长度的乘积。
- 根据权利要求1所述的自然语言处理方法,其特征在于,对所述特征映射矩阵进行激活,包括:利用relu激活函数对所述特征映射矩阵进行非线性映射。
- 根据权利要求1所述的自然语言处理方法,其特征在于,对所述特征映射矩阵进行池化,包括:对所述特征映射矩阵进行平均池化或最大池化。
- 一种电子设备,其特征在于,包括:处理器;以及存储有计算机程序代码的存储器;当所述计算机程序代码被所述处理器运行时,导致所述电子设备执行根据权利要求1-9中任一项所述的基于神经网络的自然语言处理方法。
- 一种运行指令的芯片,其特征在于,所述芯片包括存储器、处理器,所述存储器中存储代码和数据,所述存储器与所述处理器耦合,所述处理器运行所述存储器中的代码使得所述芯片用于执行权利要求1-9中任一项所述的基于神经网络的自然语言处理方法。
- 一种包含指令的程序产品,其特征在于,当所述程序产品在计算机上运行时,使得所述计算机执行权利要求1-9中任一项所述的基于神经网络的自然语言处理方法。
- 一种计算机程序,其特征在于,当所述计算机程序被处理器执行时,用于执行权利要求1-9中任一项所述的基于神经网络的自然语言处理方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010982596.5 | 2020-09-17 | ||
CN202010982596.5A CN112069837A (zh) | 2020-09-17 | 2020-09-17 | 一种基于神经网络的自然语言处理方法和电子设备 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022057406A1 true WO2022057406A1 (zh) | 2022-03-24 |
Family
ID=73682014
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/105268 WO2022057406A1 (zh) | 2020-09-17 | 2021-07-08 | 一种基于神经网络的自然语言处理方法和电子设备 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112069837A (zh) |
WO (1) | WO2022057406A1 (zh) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112069837A (zh) * | 2020-09-17 | 2020-12-11 | 湖北亿咖通科技有限公司 | 一种基于神经网络的自然语言处理方法和电子设备 |
CN114386425B (zh) * | 2022-03-24 | 2022-06-10 | 天津思睿信息技术有限公司 | 用于对自然语言文本内容进行处理的大数据体系建立方法 |
CN117574136B (zh) * | 2024-01-16 | 2024-05-10 | 浙江大学海南研究院 | 一种基于多元高斯函数空间变换的卷积神经网络计算方法 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107506350A (zh) * | 2017-08-16 | 2017-12-22 | 京东方科技集团股份有限公司 | 一种识别信息的方法和设备 |
US20190050728A1 (en) * | 2017-08-09 | 2019-02-14 | Penta Security Systems Inc. | Method and apparatus for machine learning |
CN109684626A (zh) * | 2018-11-16 | 2019-04-26 | 深思考人工智能机器人科技(北京)有限公司 | 语义识别方法、模型、存储介质和装置 |
CN110263139A (zh) * | 2019-06-10 | 2019-09-20 | 湖北亿咖通科技有限公司 | 车辆、车机设备及其基于神经网络的文本意图识别方法 |
CN110569500A (zh) * | 2019-07-23 | 2019-12-13 | 平安国际智慧城市科技股份有限公司 | 文本语义识别方法、装置、计算机设备和存储介质 |
CN112069837A (zh) * | 2020-09-17 | 2020-12-11 | 湖北亿咖通科技有限公司 | 一种基于神经网络的自然语言处理方法和电子设备 |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106202044A (zh) * | 2016-07-07 | 2016-12-07 | 武汉理工大学 | 一种基于深度神经网络的实体关系抽取方法 |
-
2020
- 2020-09-17 CN CN202010982596.5A patent/CN112069837A/zh active Pending
-
2021
- 2021-07-08 WO PCT/CN2021/105268 patent/WO2022057406A1/zh active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190050728A1 (en) * | 2017-08-09 | 2019-02-14 | Penta Security Systems Inc. | Method and apparatus for machine learning |
CN107506350A (zh) * | 2017-08-16 | 2017-12-22 | 京东方科技集团股份有限公司 | 一种识别信息的方法和设备 |
CN109684626A (zh) * | 2018-11-16 | 2019-04-26 | 深思考人工智能机器人科技(北京)有限公司 | 语义识别方法、模型、存储介质和装置 |
CN110263139A (zh) * | 2019-06-10 | 2019-09-20 | 湖北亿咖通科技有限公司 | 车辆、车机设备及其基于神经网络的文本意图识别方法 |
CN110569500A (zh) * | 2019-07-23 | 2019-12-13 | 平安国际智慧城市科技股份有限公司 | 文本语义识别方法、装置、计算机设备和存储介质 |
CN112069837A (zh) * | 2020-09-17 | 2020-12-11 | 湖北亿咖通科技有限公司 | 一种基于神经网络的自然语言处理方法和电子设备 |
Also Published As
Publication number | Publication date |
---|---|
CN112069837A (zh) | 2020-12-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022057406A1 (zh) | 一种基于神经网络的自然语言处理方法和电子设备 | |
CN112292816B (zh) | 处理核心数据压缩和存储*** | |
JP7020312B2 (ja) | 画像特徴学習装置、画像特徴学習方法、画像特徴抽出装置、画像特徴抽出方法、及びプログラム | |
US20170061279A1 (en) | Updating an artificial neural network using flexible fixed point representation | |
JP5235666B2 (ja) | 選択されたセグメントのビット平面表現を用いた連想マトリックス法、システムおよびコンピュータプログラム製品 | |
KR20190107033A (ko) | 단어 벡터 처리 방법 및 장치 | |
CN108874765B (zh) | 词向量处理方法及装置 | |
CN110799957A (zh) | 具有元数据致动的条件图执行的处理核心 | |
CN109918507B (zh) | 一种基于TextCNN改进的文本分类方法 | |
CN111782826A (zh) | 知识图谱的信息处理方法、装置、设备及存储介质 | |
CN113220865B (zh) | 一种文本相似词汇检索方法、***、介质及电子设备 | |
WO2023207059A1 (zh) | 一种视觉问答任务处理方法、***、电子设备及存储介质 | |
CN109583586B (zh) | 一种语音识别或图像识别中的卷积核处理方法及装置 | |
CN111240746A (zh) | 一种浮点数据反量化及量化的方法和设备 | |
CN107423269B (zh) | 词向量处理方法及装置 | |
CN112598129A (zh) | 基于ReRAM神经网络加速器的可调硬件感知的剪枝和映射框架 | |
CN114677548A (zh) | 基于阻变存储器的神经网络图像分类***及方法 | |
CN114781380A (zh) | 一种融合多粒度信息的中文命名实体识别方法、设备和介质 | |
JP7163515B2 (ja) | ニューラルネットワークのトレーニング方法、ビデオ認識方法及び装置 | |
CN109544651B (zh) | 用于图像对比的数据压缩方法、图像对比方法及装置 | |
CN111091001B (zh) | 一种词语的词向量的生成方法、装置及设备 | |
WO2021035598A1 (zh) | 数据处理方法及设备 | |
CN116187416A (zh) | 一种基于层剪枝灵敏度的迭代式重训练方法及一种图像处理器 | |
KR102605709B1 (ko) | Nested 와 Overlapped Named Entity 인식을 위한 피라미드 Layered 어텐션 모델 | |
CN115905546A (zh) | 基于阻变存储器的图卷积网络文献识别装置与方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21868235 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21868235 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 29.06.2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21868235 Country of ref document: EP Kind code of ref document: A1 |