WO2022057406A1 - 一种基于神经网络的自然语言处理方法和电子设备 - Google Patents

一种基于神经网络的自然语言处理方法和电子设备 Download PDF

Info

Publication number
WO2022057406A1
WO2022057406A1 PCT/CN2021/105268 CN2021105268W WO2022057406A1 WO 2022057406 A1 WO2022057406 A1 WO 2022057406A1 CN 2021105268 W CN2021105268 W CN 2021105268W WO 2022057406 A1 WO2022057406 A1 WO 2022057406A1
Authority
WO
WIPO (PCT)
Prior art keywords
matrix
convolution
word vector
input sentence
vector matrix
Prior art date
Application number
PCT/CN2021/105268
Other languages
English (en)
French (fr)
Inventor
黄海荣
李林峰
王靖宇
Original Assignee
湖北亿咖通科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 湖北亿咖通科技有限公司 filed Critical 湖北亿咖通科技有限公司
Publication of WO2022057406A1 publication Critical patent/WO2022057406A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • the invention relates to the technical field of artificial intelligence, in particular to a natural language processing method and electronic device based on a neural network.
  • Neural networks have been widely used in the field of natural language processing.
  • the essence of convolution is the process of extracting features from input data with a kernel function, and the output of convolution is the extracted features (represented by a mapping matrix).
  • the input data of the convolution can be in NHWC format, that is, [batch, in_height, in_width, in_channels], including four dimensions of batch (batch), height (height), width (width), and channels (channel), where the height dimension represents Enter the length of the data.
  • the height, width and number of input channels of the convolutional input data are all designed to be fixed values. For example, if the input data is a user sentence, regardless of the actual length of the input user sentence, the height of the input data is set to a fixed maximum value (such as 60). If the actual number of words in the user sentence is less than 60, then The specific value is used to fill up to 60 words later, and it is still calculated according to 60 words when the convolution calculation is performed.
  • the length of user sentences that is, user speech content
  • Convolution calculation using the value will cause a waste of computing resources and time.
  • the present invention is proposed to provide a neural network-based natural language processing method and electronic device that overcome the above problems or at least partially solve the above problems.
  • An object of the present invention is to provide a neural network-based natural language processing method that improves the computing speed by using variable-length convolutions.
  • a further object of the present invention is to further increase the computational speed by fusing the convolution, activation and pooling steps to be performed in the same accelerator.
  • a neural network-based natural language processing method including:
  • a convolution calculation is performed on the word vector matrix by using a convolution kernel to obtain a feature map matrix, wherein, in the convolution calculation, the word vector matrix and the volume are executed for corresponding times according to the length information of the input sentence. Multiplication between product kernels;
  • the probability values of all the classification labels in the classification label matrix are normalized to obtain the intent recognition result of the input sentence.
  • the length information is a mask of Boolean pointer type.
  • the multiplication operation between the word vector matrix and the convolution kernel is performed for corresponding times according to the length information of the input sentence, including:
  • the convolution kernel is made to slide on the height dimension of the word vector matrix to perform the corresponding number of times between the word vector matrix and the convolution kernel.
  • the multiplication operation of wherein the sliding stroke of the convolution kernel is equal to the number of words of the input sentence indicated by the length information of the input sentence, and the corresponding number of times is equal to the number of words indicated by the length information of the input sentence A value obtained by adding 1 to the difference between the number of words in the input sentence and the size of the convolution kernel.
  • the multiplication operation between the word vector matrix and the convolution kernel is performed for corresponding times according to the length information of the input sentence, including:
  • the convolution kernel is made to slide on the height dimension of the word vector matrix to execute the word vector matrix and the corresponding number of times.
  • the multiplication operation between the convolution kernels, wherein, the sliding stroke of the convolution kernel on the height dimension of the word vector matrix is equal to the number of words of the input sentence indicated by the length information of the input sentence plus.
  • the size of the above-mentioned convolution kernel is reduced by 1; the corresponding number of times is equal to the number of words of the input sentence indicated by the length information of the input sentence.
  • the number of the convolution kernels is multiple, and the sizes of the convolution kernels are different; the number of the feature mapping matrices is multiple, and the multiple feature mapping matrices are the same as the multiple
  • the convolution kernels are in one-to-one correspondence; the number of the dimensionality reduction feature mapping matrices is multiple, and the multiple dimensionality reduction feature mapping matrices are in one-to-one correspondence with the multiple feature mapping matrices;
  • Described adopting the convolution kernel to carry out convolution calculation on the word vector matrix to obtain a feature mapping matrix including:
  • Activating and pooling the feature map matrix to obtain a dimensionality reduction feature map matrix including:
  • the intention recognition based on the dimensionality reduction feature mapping matrix to obtain the classification label matrix of the input sentence including:
  • Intention recognition is performed based on the merged feature mapping matrix, and a classification label matrix of the input sentence is obtained.
  • the method before using a convolution kernel to perform convolution calculation on the word vector matrix to obtain a feature mapping matrix, the method further includes:
  • a one-dimensional channel dimension is added to the word vector matrix to expand the dimension of the word vector matrix to four dimensions including batch, height, width and channel;
  • the length of each dimension of the word vector matrix after dimension expansion is transformed, so that each dimension of the transformed word vector matrix is The lengths of the dimensions do not exceed the corresponding maximum limit, wherein the product of the lengths of the dimensions of the word vector matrix after transformation is equal to the product of the lengths of the dimensions of the word vector matrix before the transformation.
  • the feature map matrix is activated, including:
  • the feature map matrix is nonlinearly mapped using the relu activation function.
  • pooling the feature map matrix includes:
  • Average pooling or max pooling is performed on the feature map matrix.
  • an electronic device including:
  • the computer program code When executed by the processor, the computer program code causes the electronic device to execute the neural network-based natural language processing method according to any one of the above.
  • a computer-readable storage medium is also provided, where the computer-readable storage medium is used to store a computer program, and the computer program is used to implement the above-mentioned neural network-based natural language Approach.
  • a chip for running instructions includes a memory and a processor, the memory stores codes and data, the memory is coupled with the processor, and the memory is coupled to the processor.
  • the processor runs the code in the memory so that the chip is used to execute the above-mentioned neural network-based natural language processing method.
  • a program product including instructions, when the program product runs on a computer, the computer enables the computer to execute the above-mentioned neural network-based natural language processing method.
  • a computer program when the computer program is executed by a processor, for executing the above-mentioned natural language processing method based on a neural network.
  • the natural language processing method based on the neural network proposed in the embodiment of the present invention by introducing the length information of the input sentence reflecting the actual length of the input sentence, in the convolution calculation, according to the length information of the input sentence, the corresponding word vector matrix and the The multiplication operation between the convolution kernels transforms the fixed-length convolution based on the fixed preset sentence length of the prior art into a variable-length convolution based on the actual length of the input sentence, which greatly reduces the amount of calculation, thereby significantly improving the The computing speed reduces the waste of computing resources and time.
  • edge padding is performed at the end of the input sentence to fully extract the features of the last word of the input sentence, thereby improving the recognition processing accuracy of the input sentence.
  • Fig. 1 shows a typical neural network graph structure diagram of intention recognition in natural language processing in the prior art
  • FIG. 2 shows a schematic flowchart of a natural language processing method based on a neural network according to an embodiment of the present invention
  • FIG. 3 shows a graph structure diagram of a neural network in natural language processing according to an embodiment of the present invention
  • FIG. 4 shows a schematic diagram of the glide of the convolution kernel when the tail of the input sentence is filled with edges according to an embodiment of the present invention.
  • Convolution is widely used in neural networks.
  • the convolution calculation is essentially a multiply-accumulate process.
  • the input data of the convolution is in NHWC format, that is, [batch, in_height, in_width, in_channels], where batch represents the number of a batch of processing objects participating in the convolution calculation (for example, if the processing object is an image, it means that the processing objects participate in the convolution calculation.
  • the number of a batch of images if the processing object is a sentence, it indicates the number of a batch of sentences participating in the convolution calculation), in_height indicates the height of the input data, in_width indicates the width of the input data, and in_channels indicates the number of channels of the input data.
  • the convolution kernel of the convolution is in HWCN format, namely [filter_height, filter_width, in_channels, out_channels], where filter_height represents the height of the convolution kernel, filter_width represents the width of the convolution kernel, in_channels represents the number of channels of the input data, and out_channels represents the output data number of channels.
  • the output data of the convolution is in the NHWN format, namely [batch, output_height, output_width, out_channels], where batch represents the number of a batch of processing objects participating in the convolution calculation, output_height represents the height of the output data, output_width represents the width of the output data, and out_channels Indicates the number of channels of output data.
  • the change rules of height and width are as follows:
  • Padding refers to the value of the padding type of the excess part if the convolution kernel slides on the edge of the input data, if it exceeds the edge of the input data.
  • padding defaults to VALID (that is, valid), and the value is 0, then the calculation formula of the height and width of the convolution output data can be uniformly transformed into the following formula:
  • input_size is the size of the input data, specifically the height or width of the input data.
  • kernel_size is the size of the convolution kernel. Specifically, when input_size is the height of the input data, kernel_size is the height of the convolution kernel. When input_size is the width of the input data, kernel_size is the width of the convolution kernel. stride refers to the convolution step size, that is, the value of each movement of the convolution kernel.
  • Padding can also be SAME, the height and width of the convolution output data are the same as the input.
  • FIG. 1 is a typical neural network graph structure diagram for intention recognition in natural language processing in the prior art.
  • the typical neural network uses 2-dimensional matrix convolution, and the convolution function can be defined in the following format: void conv2d(int8*input,int*inputShape,int8*filter,int*filterShape,int8*output,int*outputShape), where, input is the input data pointer, inputShape is the input data dimension, filter is the convolution kernel data pointer, filterShape is the convolution kernel dimension, output is the output data pointer, outputShape is the output data dimension, and int8 indicates that the data type is an 8-bit integer. type data, int indicates that the data type is integer data.
  • the height, width and number of input channels of the convolutional input data are all designed to be fixed values, wherein the height of the convolutional input data is equal to the value to be recognized.
  • the length of the sentence that is, the length of the sentence to be recognized is designed to be a fixed length (maximum sentence length).
  • the sentence length is designed to be a fixed length of 60 as an example, no matter what the actual length of the input user sentence is, it will be calculated as 60 words (represented by a [1,60] matrix).
  • the word vector matrix [1, 60, 8, 4] is obtained as the input data of the convolution, where the height of the word vector matrix is equal to 60 (that is, the fixed length of the sentence).
  • the convolution kernels W[3,8,4,128], W[4,8,4,128] and W[5,8,4,128] are used for convolution calculation.
  • the padding is VALID and the convolution step size is
  • the stride value is 1
  • the above three convolution kernels are used to multiply and accumulate the input word vector matrix respectively to obtain the following output feature mapping matrix: [1,58,1,128], [1,57,1,128], [1, 56, 1, 128], where the height of the feature map matrix (specifically, 58, 57, and 56 in this example) represents the number of multiplication operations between each convolution kernel and the input word vector matrix.
  • the loop control of the convolution calculation takes the following form:
  • each feature map matrix is sequentially activated and pooled.
  • the dimensions of the output data and the input data in the activation step remain unchanged.
  • the above three dimensionality reduction feature map matrices [1, 1, 1, 128], [1, 1, 1, 128], [1, 1, 1, 128] are accumulated and combined to obtain the combined feature map matrix [1, 1, 1, 384] .
  • the intent recognition of the sentence is performed based on the dimensionality reduction feature map matrix [1, 1, 1, 384] through the fully connected layer, and the classification label matrix [1, 50] of the sentence is obtained.
  • the probability values of all the classification labels in the classification label matrix [1, 50] are normalized by the softmax layer, and the intent recognition result matrix [1, 50] is obtained.
  • FIG. 2 shows a schematic flowchart of a natural language processing method based on a neural network according to an embodiment of the present invention.
  • the method may at least include the following steps S102 to S112.
  • Step S102 receiving the input natural sentence as an input sentence, and generating length information of the input sentence according to the number of words in the input sentence.
  • Step S104 Determine the index of each word in the input sentence, find the vector value of each word from the word vector table according to the index of each word, and obtain a word vector matrix of the input sentence.
  • Step S106 use the convolution kernel to perform convolution calculation on the word vector matrix to obtain a feature map matrix, wherein, in the convolution calculation, perform the multiplication operation between the word vector matrix and the convolution kernel for the corresponding number of times according to the length information of the input sentence. .
  • step S108 the feature map matrix is activated and pooled to obtain a dimensionality reduction feature map matrix.
  • Step S110 performing intention recognition based on the dimensionality reduction feature mapping matrix, and obtaining a classification label matrix of the input sentence, where the classification label matrix includes the probability value of each classification label.
  • Step S112 normalize the probability values of all the classification labels in the classification label matrix to obtain the intent recognition result of the input sentence.
  • the natural language processing method based on the neural network proposed by the embodiment of the present invention by introducing the length information of the input sentence reflecting the actual length of the input sentence, in the convolution calculation, according to the length information of the input sentence, the corresponding word vector matrix and the The multiplication operation between the convolution kernels transforms the fixed-length convolution based on the fixed preset sentence length of the prior art into a variable-length convolution based on the actual length of the input sentence, which greatly reduces the amount of calculation, thereby significantly improving the The computing speed reduces the waste of computing resources and time.
  • FIG. 3 shows a graph structure diagram of a neural network in natural language processing according to an embodiment of the present invention. The steps of the embodiment of the present invention are described below with reference to FIG. 3 .
  • the input sentence is a natural sentence of the user, and the length information of the input sentence is generated according to the number of words of the input sentence (ie, the actual length of the input sentence).
  • the length information of the input statement may be a boolean pointer (bool) type mask (may be referred to as a length mask).
  • the mask is a bool type array, each bool value can be true or false, true means the bit is valid, false means the bit is invalid.
  • the length information (specifically, the length mask) of the generated input sentence contains 60 bool values, among which The first 10 bool values are true, and the rest of the bool values are false.
  • step S104 the word vector matrix of the input sentence is obtained through the word embedding look-up table.
  • each character of the input sentence is replaced by an index (specifically an index value), followed by a specific value (for example, 0)
  • index specifically an index value
  • a specific value for example, 0
  • 1 means that the batch is 1, that is, one sentence is processed at a time.
  • the batch is not 1. For example, if three sentences are processed at a time, the batch is 3, and 3 ⁇ 60 integer values are obtained.
  • the vector value of each word is found from the trained word vector table (the word vector table includes the mapping relationship between the word index and the corresponding word vector), and the word vector matrix of the input sentence is obtained.
  • the vector value of each word is a 32-bit floating point number.
  • the vector values corresponding to the index values of the 10 words actually included in the input sentence are included in the first position, and the latter are still supplemented with a specific value (for example, 0).
  • the first 10 data in the height dimension are data corresponding to the 10 words actually included in the input sentence, and the last 50 data are data supplemented with specific values.
  • the word vector matrix After obtaining the word vector matrix of the input sentence, the word vector matrix can be convolved to extract features.
  • the word vector matrix in order to improve the efficiency of the convolution calculation, before performing the convolution calculation on the word vector matrix, the word vector matrix may also be dimensionally transformed.
  • the channel of natural language input data can be regarded as having only one channel.
  • a one-dimensional channel dimension can be added to the word vector matrix to expand the dimension of the word vector matrix to four dimensions including batch, height, width, and channel. For example, for the word vector matrix [1,60,32] obtained in the previous step, the one-dimensional channel dimension can be increased to make it [1,60,32,1].
  • the length of each dimension of the input data matrix for convolution calculation is limited. Calculated in steps, reducing efficiency. Therefore, after the dimension of the word vector matrix is expanded, the length of each dimension of the dimension-expanded word vector matrix can also be transformed according to the maximum limit of the length of each dimension of the input data matrix calculated by convolution, so that The length of each dimension of the transformed word vector matrix does not exceed the corresponding maximum limit, wherein the product of the lengths of each dimension of the transformed word vector matrix is equal to the product of the lengths of each dimension of the untransformed word vector matrix.
  • the word vector matrix [1, 60, 8, 4] after dimension transformation is obtained as the input data of the convolution calculation.
  • step S106 the convolution calculation is performed on the word vector matrix using the convolution check to extract features, and a feature mapping matrix is obtained.
  • the length information of the input sentence generated in step S102 is introduced into the convolution calculation, so as to perform the multiplication operation between the word vector matrix and the convolution kernel for corresponding times according to the length information of the input sentence.
  • the feature map matrix calculated in this way contains the feature data corresponding to the corresponding number of times extracted from the input sentence at the front in the height dimension, and the other data in the latter are supplemented with a specific value (for example, 0).
  • the length information of the input sentence is introduced into the convolution calculation by modifying the existing convolution function (specifically, adding a length variable reflecting the length information of the input sentence on the basis of the existing convolution function).
  • the modified convolution function can be defined in the following format: void conv2d(int8*input,int*inputShape,int8*filter,int*filterShape,int8*output,int*outputShape,bool*mask) , where input is the input data pointer, inputShape is the input data dimension, filter is the convolution kernel data pointer, filterShape is the convolution kernel dimension, output is the output data pointer, outputShape is the output data dimension, and mask is the length variable, which can be A length mask of boolean pointer type for the input data.
  • the loop control of the convolution calculation becomes the following form:
  • variable-length convolutions can be used wherever neural networks are needed.
  • the length information (specifically, the length mask) of the input sentence generated in step S102 is input into the modified convolution function to participate in the convolution calculation.
  • the word vector matrix as the input data of the convolution calculation includes the vector values corresponding to the index values of the actual number of words contained in the input sentence, and the other following data are supplemented with specific values.
  • the length information of the input sentence is input into the modified convolution function to control the internal calculation cycle of the convolution, so that when the convolution calculation is performed, only the index value of the actual number of words contained in the word vector matrix and the input sentence is performed.
  • the corresponding vector-valued data is processed, but the specific-valued data complemented in the word vector matrix is not processed.
  • the number of multiplication operations between the word vector matrix and the convolution kernel can be calculated according to the following formula (1):
  • multiplication_number indicates the number of multiplication operations between the word vector matrix and the convolution kernel
  • slide_size indicates the sliding stroke of the convolution kernel in the height dimension of the word vector matrix
  • kernel_size indicates the size of the convolution kernel (ie, the convolution kernel height)
  • stride represents the convolution stride.
  • the step of performing the multiplication operation between the word vector matrix and the convolution kernel for a corresponding number of times according to the length information of the input sentence can be implemented as follows:
  • the convolution kernel is made to slide on the height dimension of the word vector matrix, and the corresponding number of multiplication operations between the word vector matrix and the convolution kernel is equal to the input sentence.
  • the first 10 data in the height dimension of the input word vector matrix [1, 60, 8, 4] are calculated with the convolution
  • the data corresponding to the 10 words actually included in the input sentence, and the last 50 data are the data supplemented with a specific value.
  • a convolution kernel of size 3 (W[3,8,4,128] in this example is used, where the length 3 in the height dimension is the size of the convolution kernel) is used for convolution calculation, and the convolution step size is set to 1, then let the convolution kernel W[3,8,4,128] slide on the height dimension of the word vector matrix [1,60,8,4], and the sliding stroke is equal to 10 to perform the corresponding number of word vector matrix and convolution Multiplication between kernels to get the feature map matrix [1, 58, 1, 128].
  • the calculated feature map matrix [1,58,1,128] contains the height dimension
  • the first 8 sets of feature data extracted from the input sentence, and the last 50 sets of data are supplemented with specific values.
  • the number of convolutions of the input sentence is greatly reduced.
  • Words with specific meanings are usually composed of a specific number of characters in a sentence, and different words/words can also represent different meanings by their sequences and spatial positions. For example, in Chinese, words are usually composed of 2 or 3 characters, and 4 characters are composed of idioms. Words and idioms can represent a specific meaning. Therefore, when extracting features by convolution, convolution kernels of different sizes can be used for convolution calculation. In this case, the number of convolution kernels used in step S106 may be multiple, and the sizes of the convolution kernels are different. The number of feature mapping matrices obtained by convolution may also be multiple, and multiple feature mapping matrices are in one-to-one correspondence with multiple convolution kernels. Specifically, step S106 is further implemented correspondingly as follows: using multiple convolution check word vector matrices of different sizes to perform convolution calculation respectively to obtain multiple feature mapping matrices.
  • the format of the convolution kernel is HWCN, namely [filter_height (height of the convolution kernel), filter_width (the width of the convolution kernel), in_channels (number of input channels), out_channels (number of output channels)], where the number of input channels must be the same as
  • the number of input channels of the input data ie, the word vector matrix
  • the number of output channels can be set manually, generally 128 to 300 are more appropriate.
  • the dimension of the input sentence is relatively small, and it is preferable to set the number of output channels of the convolution kernel to 128.
  • the word vector matrix [1, 60, 8, 4] after the dimension transformation described above is used as the input data of the convolution calculation
  • the padding value is 0 (that is, the edge is not filled)
  • the product stride value is 1, and the convolution kernels W[3,8,4,128], W[4,8,4,128], W[5,8,4,128] are used to pair the word vector matrix [1,60,8 ,4]
  • the feature map matrices output by the convolution calculation are [1,58,1,128], [1,57,1,128], [1,56,1,128] respectively.
  • edge padding is not performed during the convolution calculation.
  • the calculation operation is simplified and the amount of calculation is reduced, the feature of the last word of the input sentence cannot be fully extracted because the edge padding is not performed, which may cause recognition problems. Some loss of accuracy. Therefore, in another embodiment, when the convolution is used to check the word vector matrix for convolution calculation, edge padding can also be performed at the end of the input sentence to obtain a feature map matrix, so that the last word of the input sentence can be fully extracted. feature to improve the recognition and processing accuracy of input sentences.
  • the step of performing the multiplication operation between the word vector matrix and the convolution kernel for corresponding times according to the length information of the input sentence can be implemented as follows:
  • the tail is filled with edges and the convolution step size is 1, so that the convolution kernel slides on the height dimension of the word vector matrix to perform the multiplication operation between the word vector matrix and the convolution kernel for the corresponding number of times.
  • multiplication_number ((input_number+kernel_size-1)-kernel_size)/stride+1
  • the sliding stroke of the convolution kernel is equal to input_size+kernel_size-1.
  • equation (3) can be simplified to equation (4):
  • the convolution kernel is made to slide the number of words of the input sentence in the height dimension of the word vector matrix plus the stroke of the size of the convolution kernel minus 1 , the corresponding number of multiplication operations between the word vector matrix and the convolution kernel is equal to the number of words of the input sentence indicated by the length information of the input sentence.
  • the convolution kernel of size 3 is on the height dimension of the word vector matrix of the input sentence.
  • the sliding stroke is 5 (that is, from the beginning of the sentence "Guide” to the end of the sentence "Sea")
  • the number of multiplication operations is 3 times.
  • the features of the last word of the input sentence can be fully extracted with only a slight increase in the amount of calculation, thereby improving the recognition processing accuracy of the input sentence.
  • the convolution kernels [3, 8, 4, 128] and [4] are used respectively.
  • 8, 4, 128], [5, 8, 4, 128] perform convolution calculation on the word vector matrix [1, 60, 8, 4], and fill in the tail of the input sentence during the convolution calculation, so that each volume
  • the sliding stroke of the product kernel in the height dimension of the word vector matrix is based on the number of words in the input sentence, increasing the size of the convolution kernel minus 1, and the stride value is 1, then the output feature map matrix is respectively are [1,58,1,128], [1,57,1,128], [1,56,1,128].
  • the average length of a user sentence is about 14 words, compared with the prior art, for example, the fixed length of 60 words, the length only accounts for about a quarter of the original, thus, the present invention implements The method of the example can theoretically save three-quarters of the computing time.
  • step S108 the feature map matrix obtained by convolution is activated and pooled to obtain a dimension reduction feature map matrix. Specifically, it can be divided into two steps: activation and pooling.
  • the feature map matrix is activated through the activation function, which brings nonlinear characteristics to the neural network.
  • the relu activation function can be used to perform nonlinear mapping on the feature mapping matrix.
  • the characteristics of the Relu activation function should be well known to those skilled in the art, and will not be repeated herein.
  • the dimensions of the output data and the input data in the activation step remain unchanged.
  • pooling The purpose of the pooling step is to reduce the dimensionality of the extracted features in order to compress the data. Pooling is also known as downsampling, where the input is multiple values and the output is one value. Pooling usually includes average pooling, max pooling, etc. Average pooling refers to taking the average of multiple numbers of the input, and using the average value to replace the multiple numbers of the input as the output. Max pooling refers to taking the largest number from the input number as the output value.
  • average pooling or maximum pooling may be performed on the feature map matrix in the pooling step.
  • max pooling can be employed.
  • the maximum pooling is performed on the height dimension of the feature mapping matrix, and the maximum value is selected as the output value among the values of its height dimension to obtain the dimension-reducing feature mapping matrix.
  • step S106 a plurality of convolution kernels of different sizes are used to perform convolution calculation on the word vector matrix respectively, and in the case of obtaining a plurality of feature mapping matrices corresponding to the plurality of convolution kernels one-to-one, in step S108, The plurality of feature mapping matrices are activated and pooled respectively to obtain a plurality of dimensionality reduction feature mapping matrices, and the plurality of dimensionality reduction feature mapping matrices are in one-to-one correspondence with the plurality of feature mapping matrices.
  • the way of activation and pooling here is the same as described above and will not be repeated.
  • step S108 will be described in detail with reference to FIG. 3 .
  • the maximum value is selected as the output from each value of the height dimension of the activated feature map matrix [1,58,1,128], [1,57,1,128], [1,56,1,128], and the dimensionality reduction feature map is obtained.
  • the intent recognition of the input sentence can be performed based on the dimension reduction feature mapping matrix through the fully connected layer, and the classification label matrix of the input sentence can be obtained.
  • the fully connected layer is generally used at the end of the neural network to fuse the feature maps obtained by convolution, and output the fused feature maps to the desired dimension, so that each output node has a feature map. Therefore, the fully connected layer has the function of feature fusion and dimension transformation.
  • the input of the fully connected layer is a dimensionality reduction feature mapping matrix
  • the output is a classification label matrix
  • the dimension of the classification label matrix is the number of classification labels intended to be identified
  • each classification label is represented by a numerical value.
  • the probability of , the value is a floating-point number. For example, assuming that the number of intent recognition classification labels is 50, the classification label matrix output in this step is [1, 50], as shown in Figure 3.
  • step S110 may be further implemented as follows: firstly, the multiple dimensionality reduction feature mapping matrices are accumulated and merged to obtain a merged feature mapping matrix. Then, intent recognition is performed based on the merged feature mapping matrix, and the classification label matrix of the input sentence is obtained.
  • the probability values of all the classification labels in the classification label matrix can be normalized by the softmax layer to obtain the intent recognition result of the input sentence.
  • the function of the Softmax layer is to normalize the classification label probabilities. It does not change the dimension of the input data, but only changes each floating point value of the input data, so that the sum of the floating point numbers of all classification label probabilities is equal to 1. Therefore,
  • the output data of this step has the same dimension as the classification label matrix. For example, as shown in Figure 3, when the fully connected layer outputs a classification label matrix [1,50], the data output in this step is also a [1,50] matrix.
  • the activation step described above is to process each value in the feature map matrix output by the convolution
  • the pooling step is to process the feature map matrix output by the convolution as a whole.
  • the inventors found that in the hardware acceleration processing of neural networks, most of the overhead is caused by data relocation, and at the same time, the data buffering in the calculation process also has spatial locality, which are bottlenecks that limit the calculation speed. Based on this finding, in a preferred embodiment, the steps of using convolution to check the word vector matrix for convolution calculation to obtain the feature map matrix (ie, step S106 ) and the activation and summation of the feature map matrix can be performed on the same accelerator.
  • step S108 The step of pooling to obtain a dimensionality reduction feature map matrix (ie, step S108 ).
  • step S106 and step S108 may be implemented in one chip. In this way, multiple processing of the same data can be integrated into one accelerator, reducing data relocation and frequent data update buffering, thereby further improving computing speed and reducing computing time.
  • steps S104 and S106 ie, the self-embedding table lookup and convolution steps
  • the lengths of the height dimension of the input data matrix and the output data matrix are still determined based on the fixed maximum sentence length. This ensures that the storage space overhead of these input data matrices and output data matrices is stable, avoids the problem of frequently allocating storage spaces of different sizes for input sentences of different actual lengths, and is more conducive to the effective use of space.
  • an embodiment of the present invention also provides an electronic device.
  • the electronic equipment includes:
  • the computer program code When executed by the processor, it causes the electronic device to execute the neural network-based natural language processing method described in any one of the above embodiments or a combination thereof.
  • Embodiments of the present invention further provide a computer-readable storage medium, where computer instructions are stored in the computer-readable storage medium, and when the computer instructions are executed on a computer, the computer can execute the above-mentioned neural network-based natural language processing method.
  • An embodiment of the present invention further provides a chip for running instructions, the chip includes a memory and a processor, where code and data are stored in the memory, the memory is coupled with the processor, and the processor runs the memory in the memory
  • the code makes the chip to perform the above-mentioned neural network-based natural language processing method.
  • An embodiment of the present invention further provides a program product containing instructions, the program product includes a computer program, and the computer program is stored in a computer-readable storage medium, and at least one processor can read from the computer-readable storage medium
  • the computer program when the at least one processor executes the computer program, can implement the above-mentioned natural language processing method based on a neural network.
  • An embodiment of the present invention further provides a computer program, which is used to execute the above-mentioned neural network-based natural language processing method when the computer program is executed by a processor.
  • the embodiments of the present invention can achieve the following beneficial effects:
  • the natural language processing method based on the neural network proposed by the embodiment of the present invention by introducing the length information of the input sentence reflecting the actual length of the input sentence, in the convolution calculation, according to the length information of the input sentence, the corresponding word vector matrix and the The multiplication operation between the convolution kernels transforms the fixed-length convolution based on the fixed preset sentence length of the prior art into a variable-length convolution based on the actual length of the input sentence, which greatly reduces the amount of calculation, thereby significantly improving the The computing speed reduces the waste of computing resources and time.
  • edge padding is performed at the end of the input sentence to fully extract the features of the last word of the input sentence, thereby improving the recognition processing accuracy of the input sentence.
  • each functional unit in each embodiment of the present invention may be physically independent of each other, or two or more functional units may be integrated together, or all functional units may be integrated into one processing unit.
  • the above-mentioned integrated functional units may be implemented in the form of hardware, and may also be implemented in the form of software or firmware.
  • the integrated functional unit is implemented in the form of software and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of the present invention or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, which includes several instructions to make a computer
  • a computing device such as a personal computer, a server, or a network device, etc.
  • the aforementioned storage medium includes: a U disk, a removable hard disk, a read only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk and other media that can store program codes.
  • all or part of the steps of implementing the foregoing method embodiments may be accomplished by program instructions related to hardware (such as a personal computer, a server, or a computing device such as a network device), and the program instructions may be stored in a computer-readable storage
  • the program instructions when executed by the processor of the computing device, the computing device executes all or part of the steps of the methods described in the embodiments of the present invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)

Abstract

一种基于神经网络的自然语言处理方法和电子设备。该自然语言处理方法主要通过引入反映输入语句的实际长度的输入语句的长度信息,在卷积计算中根据输入语句的长度信息执行相应次的字向量矩阵与卷积核之间的乘法运算,将现有技术的基于固定的预设句子长度的固定长度卷积转变为基于输入语句的实际长度的变长卷积,大大地减少了计算量,从而显著提升了计算速度,减少了计算资源和时间的浪费。

Description

一种基于神经网络的自然语言处理方法和电子设备
本申请要求于2020年9月17日提交中国专利局、申请号为202010982596.5、申请名称为“一种基于神经网络的自然语言处理方法和电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及人工智能技术领域,特别是一种基于神经网络的自然语言处理方法和电子设备。
背景技术
神经网络(特别是卷积神经网络)已广泛应用于自然语言处理领域。卷积的本质是用一种核函数对输入数据进行特征提取的过程,卷积的输出是提取出来的特征(以映射矩阵表示)。卷积的输入数据可以为NHWC格式,即[batch,in_height,in_width,in_channels],包括batch(批次)、height(高度)、width(宽度)、channels(通道)四个维度,其中高度维度表示输入数据的长度。
现有技术中,在卷积神经网络的推理和训练过程中,卷积的输入数据的高度、宽度和输入通道数都被设计成固定的数值。例如,若输入数据为用户语句,则无论输入的用户语句的实际长度为多少,输入数据的高度均设定为固定的最大值(如60),如果用户语句的实际字数量少于60,则后面用特定值补齐到60个字,在进行卷积计算的时候仍然按照60个字来计算。然而,在自然语言处理领域,作为输入数据的用户语句(即用户讲话内容)的长度不是固定的,通常远小于所设定的最大值,在这种情况下,若仍然按照所设定的最大值来进行卷积计算,会造成计算资源和时间的浪费。
发明内容
鉴于上述问题,提出了本发明以便提供一种克服上述问题或者至少部分地解决上述问题的一种基于神经网络的自然语言处理方法和电子设备。
本发明的一个目的在于提供一种通过采用变长卷积提升计算速度的基于神经网络的自然语言处理方法。
本发明的一个进一步的目的在于通过把卷积、激活和池化步骤融合在同一个加速器执行以进一步提升计算速度。
特别地,根据本发明实施例的一方面,提供了一种基于神经网络的自然语言处理方法,包括:
接收输入的自然语句作为输入语句,根据所述输入语句中的字数量生成所述输入语句的长度信息;
确定所述输入语句中各字的索引,根据各所述字的索引从字向量表中查找到各所述字的向量值,得到所述输入语句的字向量矩阵;
采用卷积核对所述字向量矩阵进行卷积计算,得到特征映射矩阵,其中,在所述卷积计算中,根据所述输入语句的长度信息执行相应次数的所述字向量矩阵与所述卷积核之间的乘法运算;
对所述特征映射矩阵进行激活和池化,得到降维特征映射矩阵;
基于所述降维特征映射矩阵进行意图识别,得到所述输入语句的分类标签矩阵,所述分类标签矩阵包含各分类标签的概率值;
对所述分类标签矩阵中所有分类标签的概率值进行归一化,得到所述输入语句的意图识别结果。
可选地,所述长度信息为布尔指针类型的掩码。
可选地,在所述卷积计算中,根据所述输入语句的长度信息执行相应次数的所述字向量矩阵与所述卷积核之间的乘法运算,包括:
以边缘不填充、卷积步长为1的方式,令所述卷积核在所述字向量矩阵的高度维度上滑行,以执行相应次数的所述字向量矩阵与所述卷积核之间的乘法运算,其中,所述卷积核的滑行行程等于所述输入语句的长度信息所指示的所述输入语句的字数量,所述相应次数等于所述输入语句的长度信息所指示的所述输入语句的字数量与所述卷积核的尺寸的差加上1得到的数值。
可选地,在所述卷积计算中,根据所述输入语句的长度信息执行相应次数的所述字向量矩阵与所述卷积核之间的乘法运算,包括:
以在所述输入语句的尾部进行边缘填充、卷积步长为1的方式,令所述卷积核在所述字向量矩阵的高度维度上滑行,以执行相应次数的所述字向量矩阵与所述卷积核之间的乘法运算,其中,所述卷积核在所述字向量矩阵的 高度维度上的滑行行程等于所述输入语句的长度信息所指示的所述输入语句的字数量加上所述卷积核的尺寸减1;所述相应次数等于所述输入语句的长度信息所指示的所述输入语句的字数量。
可选地,在同一个加速器上执行所述采用卷积核对所述字向量矩阵进行卷积计算,得到特征映射矩阵的步骤以及所述对所述特征映射矩阵进行激活和池化,得到降维特征映射矩阵的步骤。
可选地,所述卷积核的数量为多个,且各所述卷积核的尺寸不同;所述特征映射矩阵的数量为多个,且多个所述特征映射矩阵与多个所述卷积核一一对应;所述降维特征映射矩阵的数量为多个,且多个所述降维特征映射矩阵与多个所述特征映射矩阵一一对应;
所述采用卷积核对所述字向量矩阵进行卷积计算,得到特征映射矩阵,包括:
采用不同尺寸的多个所述卷积核对所述字向量矩阵分别进行卷积计算,得到多个所述特征映射矩阵;
对所述特征映射矩阵进行激活和池化,得到降维特征映射矩阵,包括:
对多个所述特征映射矩阵分别进行激活和池化,得到多个所述降维特征映射矩阵;
所述基于所述降维特征映射矩阵进行意图识别,得到所述输入语句的分类标签矩阵,包括:
对多个所述降维特征映射矩阵进行累加合并,得到合并特征映射矩阵;
基于所述合并特征映射矩阵进行意图识别,得到所述输入语句的分类标签矩阵。
可选地,在采用卷积核对所述字向量矩阵进行卷积计算,得到特征映射矩阵之前,所述方法还包括:
对所述字向量矩阵增加一维通道维度,以将所述字向量矩阵的维度扩展为包括批次、高度、宽度和通道的四个维度;
根据所述卷积计算的输入数据矩阵的各维度的长度的最大限值,对维度扩展后的所述字向量矩阵的各维度的长度进行变换,以使变换后的所述字向量矩阵的各维度的长度均不超过对应的最大限值,其中,变换后的所述字向量矩阵的各维度的长度的乘积等于变换前的所述字向量矩阵的各维度的长度的乘积。
可选地,对所述特征映射矩阵进行激活,包括:
利用relu激活函数对所述特征映射矩阵进行非线性映射。
可选地,对所述特征映射矩阵进行池化,包括:
对所述特征映射矩阵进行平均池化或最大池化。
根据本发明实施例的另一方面,还提供了一种电子设备,包括:
处理器;以及
存储有计算机程序代码的存储器;
当所述计算机程序代码被所述处理器运行时,导致所述电子设备执行根据上述任一项所述的基于神经网络的自然语言处理方法。
根据本发明实施例的再一方面,还提供了一种计算机可读存储介质,所述计算机可读存储介质用于存储计算机程序,所述计算机程序用于实现如上述的基于神经网络的自然语言处理方法。
根据本发明实施例的还一方面,还提供了一种运行指令的芯片,所述芯片包括存储器、处理器,所述存储器中存储代码和数据,所述存储器与所述处理器耦合,所述处理器运行所述存储器中的代码使得所述芯片用于执行上述的基于神经网络的自然语言处理方法。
根据本发明实施例的又一方面,还提供了一种包含指令的程序产品,当所述程序产品在计算机上运行时,使得所述计算机执行上述的基于神经网络的自然语言处理方法。
根据本发明实施例的又一方面,还提供了一种计算机程序,当所述计算机程序被处理器执行时,用于执行上述的基于神经网络的自然语言处理方法。
本发明实施例提出的基于神经网络的自然语言处理方法中,通过引入反映输入语句的实际长度的输入语句的长度信息,在卷积计算中根据输入语句的长度信息执行相应次的字向量矩阵与卷积核之间的乘法运算,将现有技术的基于固定的预设句子长度的固定长度卷积转变为基于输入语句的实际长度的变长卷积,大大地减少了计算量,从而显著提升了计算速度,减少了计算资源和时间的浪费。
进一步地,在进行卷积计算时,在输入语句的尾部进行边缘填充,以充分提取输入语句的最后一个字的特征,从而提高输入语句的识别处理精确度。
更进一步地,通过把卷积、激活和池化步骤融合在同一个加速器执行,能够减少数据搬迁以及频繁的数据更新缓冲,从而进一步提升计算速度,减 少计算时间。
上述说明仅是本发明技术方案的概述,为了能够更清楚了解本发明的技术手段,而可依照说明书的内容予以实施,并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂,以下特举本发明的具体实施方式。
根据下文结合附图对本发明具体实施例的详细描述,本领域技术人员将会更加明了本发明的上述以及其他目的、优点和特征。
附图说明
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本发明的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:
图1示出了现有技术的自然语言处理中意图识别的一种典型神经网络图谱结构图;
图2示出了根据本发明一实施例的基于神经网络的自然语言处理方法的流程示意图;
图3示出了根据本发明一实施例的自然语言处理中的神经网络图谱结构图;
图4示出了根据本发明一实施例的在输入语句的尾部进行边缘填充情况下卷积核的滑行示意图。
具体实施方式
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。
卷积在神经网络中广泛使用。卷积计算实质上是一个乘累加的过程。2维矩阵卷积是机器学习中应用最广泛的卷积,其计算公式如下:output=W*input+bias,其中,output为卷积的输出数据,input为卷积的输入数据,W为卷积核,bias为预设偏置值。卷积的输入数据为NHWC格式,即[batch, in_height,in_width,in_channels],其中batch表示参与卷积计算的一批处理对象的数量(例如,若处理对象为图像,则表示参与卷积计算的一批图像的数量,若处理对象为语句,则表示参与卷积计算的一批语句的数量),in_height表示输入数据的高度,in_width表示输入数据的宽度,in_channels表示输入数据的通道数。卷积的卷积核为HWCN格式,即[filter_height,filter_width,in_channels,out_channels],其中filter_height表示卷积核的高度,filter_width表示卷积核的宽度,in_channels表示输入数据的通道数,out_channels表示输出数据的通道数。卷积的输出数据为NHWN格式,即[batch,output_height,output_width,out_channels],其中batch表示参与卷积计算的一批处理对象的数量,output_height表示输出数据的高度,output_width表示输出数据的宽度,out_channels表示输出数据的通道数。其中,在卷积计算中,高度height和宽度width的变化规则如下所示:
Figure PCTCN2021105268-appb-000001
Padding指卷积核在输入数据的边缘滑动时,如果超出输入数据的边缘,对超出部分的填充类型的取值。通常,padding默认为VALID(即有效的),取值为0,则卷积输出数据的高度和宽度的尺寸的计算公式可统一转化为下式:
(input_size–kernel_size)/stride+1
其中,input_size为输入数据尺寸,具体可为输入数据的高度或宽度。kernel_size为卷积核尺寸,具体地,当input_size为输入数据的高度时,kernel_size为卷积核的高度,当input_size为输入数据的宽度时,kernel_size为卷积核的宽度。stride指卷积步长,即卷积核每次的移动数值。
Padding还可以为SAME,卷积输出数据的高度和宽度与输入相同。
图1是现有技术的自然语言处理中意图识别的一种典型神经网络图谱结构图。该典型神经网络采用2维矩阵卷积,卷积函数可以定义为以下格式:void conv2d(int8*input,int*inputShape,int8*filter,int*filterShape,int8*output,int*outputShape),其中,input为输入数据指针,inputShape为输入数据维度,filter为卷积核数据指针,filterShape为卷积核维度,output为输出数据指针,outputShape为输出数据维度,int8表示数据类型为8位(bits)整型数据,int表示数据类型为整型数据。
现有技术中,在基于神经网络的自然语言处理的意图识别中,卷积的输入数据的高度、宽度和输入通道数都被设计成固定数值,其中,卷积的输入数据的高度等于待识别的语句的长度,也就是说,待识别的语句的长度被设计成一个固定长度(语句最大长度)。例如,如图1所示,以语句长度被设计成固定长度60为例,无论输入用户语句的实际长度为多少,均当成60个字来计算(以[1,60]矩阵表示),在经过字嵌入查表和维度变换后,得到字向量矩阵[1,60,8,4]作为卷积的输入数据,其中,字向量矩阵的高度等于60(即语句的固定长度)。在卷积过程中分别采用卷积核W[3,8,4,128]、W[4,8,4,128]和W[5,8,4,128]进行卷积计算,在padding为VALID且卷积步长stride取值1的情况下,分别利用上述三个卷积核与输入的字向量矩阵进行乘累加,得到以下输出的特征映射矩阵:[1,58,1,128]、[1,57,1,128]、[1,56,1,128],其中特征映射矩阵的高度(本例中具体为58、57、56)即表示各卷积核与输入的字向量矩阵之间的乘法运算的次数。在这种计算方式中,卷积计算的循环控制采用以下形式:
for(int i=0;i<60;i++){
//卷积内部计算
}
在得到各卷积核下的各特征映射矩阵后,分别对各特征映射矩阵顺序进行激活和池化。激活步骤中输出数据与输入数据的维度保持不变。通过分别对各特征映射矩阵[1,58,1,128]、[1,57,1,128]、[1,56,1,128]进行激活,得到激活的特征映射矩阵[1,58,1,128]、[1,57,1,128]、[1,56,1,128]。之后,分别对各激活的特征映射矩阵[1,58,1,128]、[1,57,1,128]、[1,56,1,128]进行池化,具体地分别对各激活的特征映射矩阵的高度维度的多个值进行池化,得到各降维特征映射矩阵[1,1,1,128]、[1,1,1,128]、[1,1,1,128]。
然后,对上述3个降维特征映射矩阵[1,1,1,128]、[1,1,1,128]、[1,1,1,128]进行累加合并,得到合并特征映射矩阵[1,1,1,384]。接着,通过全连接层基于降维特征映射矩阵[1,1,1,384]进行语句的意图识别,得到语句的分类标签矩阵[1,50]。最后,通过softmax层对分类标签矩阵[1,50]中所有分类标签的概率值进行归一化,得到意图识别结果矩阵[1,50]。
可见,现有技术中,无论输入的用户语句的实际长度为多少,在卷积时均当成固定长度(本例中为60个字)来计算,卷积计算中输入的字向量矩阵 与卷积核之间的乘法运算的次数是固定的。然而,在实际生活中,用户语句的长度并不是固定的,其实际长度往往小于、甚至远小于设计的固定长度。如此,在用户语句的实际长度小于该固定长度的情况下,会造成计算资源和时间的极大浪费。
为解决上述技术问题,本发明实施例提出一种基于神经网络的自然语言处理方法。图2示出了根据本发明一实施例的基于神经网络的自然语言处理方法的流程示意图。参见图2所示,该方法至少可以包括以下步骤S102至步骤S112。
步骤S102,接收输入的自然语句作为输入语句,根据输入语句中的字数量生成输入语句的长度信息。
步骤S104,确定输入语句中各字的索引,根据各字的索引从字向量表中查找到各字的向量值,得到输入语句的字向量矩阵。
步骤S106,采用卷积核对字向量矩阵进行卷积计算,得到特征映射矩阵,其中,在卷积计算中,根据输入语句的长度信息执行相应次数的字向量矩阵与卷积核之间的乘法运算。
步骤S108,对特征映射矩阵进行激活和池化,得到降维特征映射矩阵。
步骤S110,基于降维特征映射矩阵进行意图识别,得到输入语句的分类标签矩阵,分类标签矩阵包含各分类标签的概率值。
步骤S112,对分类标签矩阵中所有分类标签的概率值进行归一化,得到输入语句的意图识别结果。
本发明实施例提出的基于神经网络的自然语言处理方法中,通过引入反映输入语句的实际长度的输入语句的长度信息,在卷积计算中根据输入语句的长度信息执行相应次的字向量矩阵与卷积核之间的乘法运算,将现有技术的基于固定的预设句子长度的固定长度卷积转变为基于输入语句的实际长度的变长卷积,大大地减少了计算量,从而显著提升了计算速度,减少了计算资源和时间的浪费。
图3示出了根据本发明一实施例的自然语言处理中的神经网络图谱结构图。下面结合图3,对本发明的实施例的各步骤进行介绍。
上文步骤S102中,输入语句为用户的一个自然语句,根据输入语句的字数量(即,输入语句的实际长度)生成输入语句的长度信息。
具体地,输入语句的长度信息可以是布尔指针(bool)类型的掩码(不妨 称为长度掩码)。掩码是一种bool型的数组,每一个bool值可以是true或false,true代表该位有效,false则代表该位无效。例如,假设神经网络模型中设定句子的最大长度为60个字,输入语句实际包含字数量为10个,则生成的输入语句的长度信息(具体为长度掩码)含有60个bool值,其中前10个bool值为true,其余bool值都是false。
上文步骤S104中,通过字嵌入查表得到输入语句的字向量矩阵。
具体地,仍以设定句子的最大长度为60个字且输入语句的实际长度为10个为例,将输入语句的每个字用索引(具体为索引值)代替,后面用特定值(例如0)补足到60个字,得到1×60个整形数值(即[1,60]矩阵),其中,1代表batch为1,即一次处理1句话。当然,本领域技术人员可以理解,若神经网络模型一次处理多句话,则batch不为1,例如,若一次处理3句话,则batch为3,得到3×60个整形数值。
之后,根据每个字的索引,从训练好的字向量表(字向量表包括字索引与对应的字向量之间的映射关系)里查找出各字的向量值,得到输入语句的字向量矩阵。例如,假设每个字由32维的向量表示,则每个字的向量值是32位浮点数。对于包含10个字的输入语句,基于前面得到的[1,60]矩阵中的索引值,得到[1,60,32]字向量矩阵。在该[1,60,32]字向量矩阵中,包含位置在前的与输入语句实际包含的10个字的索引值对应的向量值,后面仍以特定值(例如0)补足。具体至字向量矩阵中的各维度,则其高度维度上前10个数据为与输入语句实际包含的10个字对应的数据,后50个数据为以特定值补足的数据。
在得到输入语句的字向量矩阵后,可对字向量矩阵进行卷积以提取特征。在一个实施例中,为了提高卷积计算的效率,在对字向量矩阵进行卷积计算之前,还可以先对字向量矩阵进行维度变换。
由于2维卷积函数的输入数据矩阵的维度包括批次、高度、宽度、通道四个维度,而自然语言却没有通道信息,因此,可以把自然语言输入数据的通道当成只有1个通道。由此,可以对字向量矩阵增加一维通道维度,以将字向量矩阵的维度扩展为包括批次、高度、宽度和通道的四个维度。例如,对于前一步骤得到的字向量矩阵[1,60,32],可增加一维通道维度,使之变成[1,60,32,1]。
进一步,一般来说,用硬件加速卷积计算时,对卷积计算的输入数据矩 阵的各个维度的长度有限制,如果输入数据矩阵的某维度的长度超过硬件可支持的最大长度,则需要进行分步计算,降低了效率。因此,在对字向量矩阵的维度扩展后,还可以根据卷积计算的输入数据矩阵的各维度的长度的最大限值,对维度扩展后的字向量矩阵的各维度的长度进行变换,以使变换后的字向量矩阵的各维度的长度均不超过对应的最大限值,其中,变换后的字向量矩阵的各维度的长度的乘积等于变换前的字向量矩阵的各维度的长度的乘积。例如,对于字向量矩阵[1,60,32,1],假设卷积计算要求(具体为卷积加速器硬件要求)的输入数据矩阵的第3维度(即宽度维度)的长度不能超过16,而维度扩展后的字向量矩阵[1,60,32,1]的第3维度的长度为32,超过了16,则可以把字向量矩阵[1,60,32,1]的维度变换为[1,60,8,4],由于维度变换后的字向量矩阵各维度的长度的乘积1×60×8×4=1920与维度变换前的字向量矩阵各维度的长度的乘积1×60×32×1=1920相等,因此,利用字向量矩阵[1,60,8,4]进行卷积运算时等同于利用字向量矩阵[1,60,32,1]进行卷积运算。由此,得到维度变换后的字向量矩阵[1,60,8,4]作为卷积计算的输入数据。
上文步骤S106中,采用卷积核对字向量矩阵进行卷积计算以提取特征,得到特征映射矩阵。在进行卷积计算时,通过将步骤S102中生成的输入语句的长度信息引入卷积计算中,以根据输入语句的长度信息执行相应次数的字向量矩阵与卷积核之间的乘法运算。通过这种方式计算得到的特征映射矩阵中高度维度上包含位置在前的从输入语句提取的与该相应次数对应数量的特征数据,后面其他数据则以特定值(例如0)补足。
本发明中,通过修改现有卷积函数(具体地,在现有卷积函数的基础上增加反映输入语句的长度信息的长度变量)将输入语句的长度信息引入卷积计算。例如,对于2维卷积,修改后卷积函数可以定义为以下格式:void conv2d(int8*input,int*inputShape,int8*filter,int*filterShape,int8*output,int*outputShape,bool*mask),其中,input为输入数据指针,inputShape为输入数据维度,filter为卷积核数据指针,filterShape为卷积核维度,output为输出数据指针,outputShape为输出数据维度,mask为长度变量,具体可以为输入数据的布尔指针类型的长度掩码。这样,卷积计算的循环控制变为以下形式:
int i=0;
while(mask[i++]==true){
//卷积内部计算
}
由此,卷积计算中字向量矩阵与卷积核之间的乘法运算的循环次数将不再是固定的,而是与输入语句的长度信息相关。通过在自定义卷积函数中增加长度变量,将现有技术的固定长度卷积转变成长度可变卷积(不妨称为变长卷积)。变长卷积可以使用在神经网络的任何需要的地方。
本步骤中将步骤S102生成的输入语句的长度信息(具体为长度掩码)输入修改后卷积函数中参与卷积计算。由前文所述可知,作为卷积计算的输入数据的字向量矩阵包含位置在前的与输入语句包含的实际数量的字的索引值对应的向量值,其他后面数据是用特定值补足的。在进行卷积计算提取特征时,实际上只需要对字向量矩阵中与输入语句包含的实际数量的字的索引值对应的向量值进行处理即可有效提取输入语句的特征,而无需如现有技术中对字向量矩阵中包括补足用的特定值在内的所有数据进行处理。本发明中通过将输入语句的长度信息输入修改后卷积函数以控制卷积内部计算的循环,使得在进行卷积计算时只针对字向量矩阵中与输入语句包含的实际数量的字的索引值对应的向量值数据进行处理,而不对字向量矩阵中补足的特定值数据进行处理。
由卷积原理可知,在卷积计算中,字向量矩阵与卷积核之间的乘法运算的次数可根据下式(1)计算得到:
multiplication_number=(slide_size-kernel_size)/stride+1    (1)
其中,multiplication_number表示字向量矩阵与卷积核之间的乘法运算次数,slide_size表示卷积核在字向量矩阵的高度维度上的滑行行程,kernel_size表示卷积核的尺寸(即卷积核高度),stride表示卷积步长。
在一种实施方案中,在步骤S106的卷积计算中,根据输入语句的长度信息执行相应次数的字向量矩阵与所述卷积核之间的乘法运算的步骤可如下实施:
以边缘不填充、卷积步长为1的方式,令卷积核在字向量矩阵的高度维度上滑行,以执行相应次数的字向量矩阵与卷积核之间的乘法运算。在这种方式中,卷积核的滑行行程等于输入语句的长度信息所指示的输入语句的字数量,则将slide_size=input_number(input_number表示输入语句的字数量)和stride=1代入式(1)可得到下式(2):
multiplication_number=input_number-kernel_size+1    (2)
即,在边缘不填充、卷积步长为1的方式下,令卷积核在字向量矩阵的高度维度上滑行,字向量矩阵与卷积核之间的乘法运算的相应次数等于输入语句的长度信息所指示的输入语句的字数量与卷积核的尺寸的差加上1得到的数值。
例如,当输入语句的实际长度为10(即,输入语句的字数量为10)时,卷积计算输入的字向量矩阵[1,60,8,4]的高度维度上前10个数据为与输入语句实际包含的10个字对应的数据,后50个数据为以特定值补足的数据。假设采用尺寸为3的卷积核(本例中具体为W[3,8,4,128],其中高度维度上的长度3为卷积核的尺寸)进行卷积计算,卷积步长设定为1,则令卷积核W[3,8,4,128]在字向量矩阵[1,60,8,4]的高度维度上滑行,滑动行程等于10,以执行相应次数的字向量矩阵与卷积核之间的乘法运算,得到特征映射矩阵[1,58,1,128]。其中,字向量矩阵与卷积核之间的乘法运算的相应次数为(10-3)+1=8次,相应地,计算得到的特征映射矩阵[1,58,1,128]中高度维度上包含位置在前的从输入语句提取的8组特征数据,后50组数据则以特定值补足。与现有技术中采用相同尺寸卷积核进行固定长度(假设为60)卷积得到特征映射矩阵需进行(60-3)/1+1=58次乘法运算相比,同一卷积核下对输入语句的卷积次数大大减少。
在语句中通常由特定数量的字符组成具有特定意义的词语,且不同的字/词的前后顺序和空间位置也可以代表不同的意思。例如,中文中通常由2个字或3个字组成词语,4个字组成成语,词语和成语可代表一个特定意义。因此,在进行卷积提取特征的时候,可分别采用不同尺寸的卷积核进行卷积计算。这种情况下,步骤S106中采用的卷积核的数量可为多个,且各卷积核的尺寸不同。卷积得到的特征映射矩阵的数量也可为多个,且多个特征映射矩阵与多个卷积核一一对应。具体地,步骤S106相应地进一步实施为:采用不同尺寸的多个卷积核对字向量矩阵分别进行卷积计算,得到多个特征映射矩阵。
卷积核的格式为HWCN,即[filter_height(卷积核的高度),filter_width(卷积核的宽度),in_channels(输入通道数),out_channels(输出通道数)],其中,输入通道数必须和输入数据(即字向量矩阵)的输入通道数一致,输出通道数可人为设定,一般设定为128到300比较合适。对于自然语言处理领域,输入语句的维度比较少,优选可将卷积核的输出通道数设定为128。
对于中文语句,优选分别每3个字、每4个字、每5个字提取一次特征,即,分别以尺寸为3、4、5的卷积核进行卷积计算。例如,如图3所示,以前文所述的维度变换后的字向量矩阵[1,60,8,4]作为卷积计算的输入数据,padding取值为0(即边缘不填充),卷积步长stride取值为1,分别采用卷积核W[3,8,4,128]、W[4,8,4,128]、W[5,8,4,128]对字向量矩阵[1,60,8,4]进行卷积计算输出的特征映射矩阵分别为[1,58,1,128]、[1,57,1,128]、[1,56,1,128]。其中,在卷积计算中,分别执行(10-3)+1=8次卷积核W[3,8,4,128]与字向量矩阵[1,60,8,4]之间的乘法运算,(10-4)+1=7次卷积核W[4,8,4,128]与字向量矩阵[1,60,8,4]之间的乘法运算,以及(10-5)+1=6次卷积核W[5,8,4,128]与字向量矩阵[1,60,8,4]之间的乘法运算。
在上述实施方案中,在卷积计算时未做边缘填充,虽然简化了计算操作且减少了计算量,但由于未做边缘填充,输入语句的最后一个字的特征无法被充分提取,会造成识别精确度的一定损失。因此,在另一种实施方案中,在采用卷积核对字向量矩阵进行卷积计算时,还可以在输入语句的尾部进行边缘填充,得到特征映射矩阵,从而能够充分提取输入语句的最后一个字的特征,提高输入语句的识别处理精确度。
在这种情况下,在对字向量矩阵进行卷积计算时,根据输入语句的长度信息执行相应次数的字向量矩阵与卷积核之间的乘法运算的步骤可如下实施:以在输入语句的尾部进行边缘填充、卷积步长为1的方式,令卷积核在字向量矩阵的高度维度上滑行,以执行相应次数的字向量矩阵与卷积核之间的乘法运算。其中,卷积核在字向量矩阵的高度维度上的滑行行程相比卷积核在边缘不填充方式下在字向量矩阵的高度维度上的滑行行程增加指定长度,指定长度等于卷积核的尺寸减1,也即是说,这种情况下卷积核在字向量矩阵的高度维度上的滑行行程等于输入语句的长度信息所指示的输入语句的字数量加上卷积核的尺寸减1,即,slide_size=input_number+kernel_size-1。由此,上式(1)可以转化为下式(3):
multiplication_number=((input_number+kernel_size-1)-kernel_size)/stride+1
(3)
此时,卷积核的滑动行程等于input_size+kernel_size-1。
在卷积步长stride=1的情况下,式(3)可简化为式(4):
multiplication_number=input_number    (4)
即,在输入语句的尾部进行边缘填充、卷积步长为1的方式下,令卷积核在字向量矩阵的高度维度上滑行输入语句的字数量加上卷积核的尺寸减1的行程,则字向量矩阵与卷积核之间的乘法运算的相应次数等于输入语句的长度信息所指示的输入语句的字数量。
下面以图4的例子对前述两种方式下卷积核在字向量矩阵的高度维度上的滑行进行说明。例如,对于字数量为5的输入语句“导航去上海”,在边缘不填充、卷积步长为1的方式下,尺寸为3的卷积核在该输入语句的字向量矩阵的高度维度上的滑行行程为5(即,从句首“导”滑行至句尾“海”),卷积核只需滑行(5-3)+1=3次,因此字向量矩阵与卷积核之间的乘法运算的次数为3次。在输入语句的尾部进行边缘填充、卷积步长为1的方式下,将卷积核在该输入语句的字向量矩阵的高度维度上的滑行行程增加至5+3-1=7(即,从句首“导”滑行至句尾“海”后填充的2个单位行程处),则卷积核需滑行(7-3)+1=5次,因此,字向量矩阵与卷积核之间的乘法运算的次数为5次,以充分提取输入语句的最后一个字“海”的特征。
在进行卷积计算时,通过在输入语句的尾部进行边缘填充,能够在仅稍微增加计算量的前提下充分提取输入语句的最后一个字的特征,从而提高输入语句的识别处理精确度。
下面仍以维度变换后的字向量矩阵[1,60,8,4]作为卷积计算的输入数据为例,如图3所示,分别采用卷积核[3,8,4,128]、[4,8,4,128]、[5,8,4,128]对字向量矩阵[1,60,8,4]进行卷积计算,且在进行卷积计算时对输入语句的尾部进行边缘填充,令各卷积核在字向量矩阵的高度维度上的滑行行程均在输入语句的字数量的基础上增加卷积核的尺寸减1的长度,且步长stride取值为1,则输出的特征映射矩阵分别为[1,58,1,128]、[1,57,1,128]、[1,56,1,128]。其中,在卷积计算中,分别执行10次卷积核W[3,8,4,128]与字向量矩阵[1,60,8,4]之间的乘法运算,10次卷积核W[4,8,4,128]与字向量矩阵[1,60,8,4]之间的乘法运算,以及10次卷积核W[5,8,4,128]与字向量矩阵[1,60,8,4]之间的乘法运算。
考虑到在实际应用中,一个用户语句的平均长度大约为14个字,与现有技术中例如固定长度60个字相比,长度仅占原来的四分之一左右,由此,本发明实施例的方法理论上可节省四分之三的计算时间。
上文步骤S108中,对卷积得到的特征映射矩阵进行激活和池化,得到降维特征映射矩阵。具体地,可分为激活和池化两个步骤。
在激活步骤中,通过激活函数对特征映射矩阵进行激活,从而给神经网络带来非线性特性。具体地,可利用relu激活函数对特征映射矩阵进行非线性映射。Relu激活函数的特性应为本领域技术人员所熟知,本文不再赘述。激活步骤中输出数据与输入数据的维度保持不变。
池化步骤的目的是对提取的特征进行降维,以压缩数据。池化也称为下采样,其输入是多个值,输出是一个值。池化通常有平均池化、最大池化等。平均池化指对输入的多个数取均值,以该均值代替该输入的多个数作为输出。最大池化指从输入的多个数中取最大的数作为输出值。
本实施例中,在池化步骤中可对特征映射矩阵进行平均池化或最大池化。优选地,可采用最大池化。具体地,对特征映射矩阵的高度维度进行最大池化,在其高度维度的各值中选取最大值作为输出值,得到降维特征映射矩阵。
进一步地,在步骤S106中采用不同尺寸的多个卷积核对字向量矩阵分别进行卷积计算,得到与多个卷积核一一对应的多个特征映射矩阵的情况下,在步骤S108中对该多个特征映射矩阵分别进行激活和池化,得到多个降维特征映射矩阵,该多个降维特征映射矩阵与该多个特征映射矩阵一一对应。此处的激活和池化的方式与前文所述相同,不再重复。
下面结合图3,对步骤S108进行具体举例说明。
如图3所示,在卷积输出的特征映射矩阵分别为[1,58,1,128]、[1,57,1,128]、[1,56,1,128]的情况下,采用relu激活函数分别对特征映射矩阵[1,58,1,128]、[1,57,1,128]、[1,56,1,128]进行激活,得到激活的特征映射矩阵[1,58,1,128]、[1,57,1,128]、[1,56,1,128]。之后,对激活的特征映射矩阵[1,58,1,128]、[1,57,1,128]、[1,56,1,128]分别进行最大池化。具体地,在激活的特征映射矩阵[1,58,1,128]、[1,57,1,128]、[1,56,1,128]的高度维度的各值中选取最大值作为输出,得到降维特征映射矩阵[1,1,1,128]、[1,1,1,128]、[1,1,1,128]。
上文步骤S110中,可通过全连接层基于降维特征映射矩阵进行输入语句的意图识别,得到输入语句的分类标签矩阵。
全连接层一般用在神经网络的末尾,用于把卷积得到的特征映射进行融合,并将融合的特征映射输出成期望的维度,以使得输出的每一个节点都带有特征映射。因此,全连接层具有特征融合和维度变换的作用。本发明的实施例中,全连接层的输入是降维特征映射矩阵,输出是分类标签矩阵,分类标签矩阵的维度为意图识别分类标签的个数,每种分类标签以一个数值表示 该分类标签的概率,数值为浮点数。例如,假设意图识别分类标签的个数为50个,则本步骤输出的分类标签矩阵为[1,50],如图3所示。
在一个实施例中,在步骤S108中得到多个降维特征映射矩阵的情况下,步骤S110还可以进一步实施为:首先,对多个降维特征映射矩阵进行累加合并,得到合并特征映射矩阵。然后,基于合并特征映射矩阵进行意图识别,得到输入语句的分类标签矩阵。以图3所示为例,对于降维特征映射矩阵[1,1,1,128]、[1,1,1,128]、[1,1,1,128],将它们累加合并,得到合并特征映射矩阵[1,1,1,384],将合并特征映射矩阵[1,1,1,384]作为全连接层的输入进行意图识别,得到输出的分类标签矩阵[1,50]。
上文步骤S112中,可通过softmax层对分类标签矩阵中所有分类标签的概率值进行归一化,得到输入语句的意图识别结果。Softmax层的作用是对分类标签概率进行归一化,它不会改变输入数据的维度,而只是改变输入数据的每个浮点值,使得所有分类标签概率的浮点数之和等于1,因此,本步骤的输出数据与分类标签矩阵的维度相同。例如,如图3所示,在全连接层输出分类标签矩阵[1,50]的情况下,本步骤输出的数据同样为[1,50]矩阵。
另外,本领域技术人员可理解,前文所述的激活步骤是对卷积输出的特征映射矩阵中的每一个值进行处理,而池化步骤则是对卷积输出的特征映射矩阵做整体处理。发明人发现,在神经网络的硬件加速处理中,大部分开销是由数据搬迁引起的,同时计算过程中的数据缓冲也存在空间局部性,这些都是限制计算速度的瓶颈。基于此发现,在一个优选的实施例中,可以在同一个加速器上执行采用卷积核对字向量矩阵进行卷积计算,得到特征映射矩阵的步骤(即步骤S106)以及对特征映射矩阵进行激活和池化,得到降维特征映射矩阵的步骤(即步骤S108)。具体地,可以在一个芯片中实现步骤S106和步骤S108。如此,能够将对同一个数据的多个处理融合在一个加速器上,减少数据搬迁以及频繁的数据更新缓冲,从而进一步提升计算速度,减少计算时间。
另外,在本发明中,在步骤S104和步骤S106(即,自嵌入查表和卷积步骤)中,输入数据矩阵和输出数据矩阵的高度维度的长度仍然是基于固定设置的句子最大长度来确定的,这样保证了这些输入数据矩阵和输出数据矩阵的存储空间开销是稳定的,避免了对于不同实际长度的输入语句需频繁分配不同大小的存储空间的问题,更利于空间的有效使用。
基于同一发明构思,本发明实施例还提供了一种电子设备。该电子设备包括:
处理器;以及
存储有计算机程序代码的存储器;
当该计算机程序代码被处理器运行时,导致该电子设备执行上述任意一个实施例或其组合所述的基于神经网络的自然语言处理方法。
本发明实施例还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机指令,当该计算机指令在计算机上运行时,使得计算机执行如上述的基于神经网络的自然语言处理方法。
本发明实施例还提供一种运行指令的芯片,所述芯片包括存储器、处理器,所述存储器中存储代码和数据,所述存储器与所述处理器耦合,所述处理器运行所述存储器中的代码使得所述芯片用于执行上述的基于神经网络的自然语言处理方法。
本发明实施例还提供一种包含指令的程序产品,所述程序产品包括计算机程序,所述计算机程序存储在计算机可读存储介质中,至少一个处理器可以从所述计算机可读存储介质读取所述计算机程序,所述至少一个处理器执行所述计算机程序时可实现上述的基于神经网络的自然语言处理方法。
本发明实施例还提供一种计算机程序,当所述计算机程序被处理器执行时,用于执行上述的基于神经网络的自然语言处理方法。
根据上述任意一个可选实施例或多个可选实施例的组合,本发明实施例能够达到如下有益效果:
本发明实施例提出的基于神经网络的自然语言处理方法中,通过引入反映输入语句的实际长度的输入语句的长度信息,在卷积计算中根据输入语句的长度信息执行相应次的字向量矩阵与卷积核之间的乘法运算,将现有技术的基于固定的预设句子长度的固定长度卷积转变为基于输入语句的实际长度的变长卷积,大大地减少了计算量,从而显著提升了计算速度,减少了计算资源和时间的浪费。
进一步地,在进行卷积计算时,在输入语句的尾部进行边缘填充,以充分提取输入语句的最后一个字的特征,从而提高输入语句的识别处理精确度。
更进一步地,通过把卷积、激活和池化步骤融合在同一个加速器执行, 能够减少数据搬迁以及频繁的数据更新缓冲,从而进一步提升计算速度,减少计算时间。
所属领域的技术人员可以清楚地了解到,上述描述的***、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,为简洁起见,在此不另赘述。
另外,在本发明各个实施例中的各功能单元可以物理上相互独立,也可以两个或两个以上功能单元集成在一起,还可以全部功能单元都集成在一个处理单元中。上述集成的功能单元既可以采用硬件的形式实现,也可以采用软件或者固件的形式实现。
本领域普通技术人员可以理解:所述集成的功能单元如果以软件的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,其包括若干指令,用以使得一台计算设备(例如个人计算机,服务器,或者网络设备等)在运行所述指令时执行本发明各实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM)、随机存取存储器(RAM),磁碟或者光盘等各种可以存储程序代码的介质。
或者,实现前述方法实施例的全部或部分步骤可以通过程序指令相关的硬件(诸如个人计算机,服务器,或者网络设备等的计算设备)来完成,所述程序指令可以存储于一计算机可读取存储介质中,当所述程序指令被计算设备的处理器执行时,所述计算设备执行本发明各实施例所述方法的全部或部分步骤。
最后应说明的是:以上各实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述各实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:在本发明的精神和原则之内,其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案脱离本发明的保护范围。

Claims (13)

  1. 一种基于神经网络的自然语言处理方法,其特征在于,包括:
    接收输入的自然语句作为输入语句,根据所述输入语句中的字数量生成所述输入语句的长度信息;
    确定所述输入语句中各字的索引,根据各所述字的索引从字向量表中查找到各所述字的向量值,得到所述输入语句的字向量矩阵;
    采用卷积核对所述字向量矩阵进行卷积计算,得到特征映射矩阵,其中,在所述卷积计算中,根据所述输入语句的长度信息执行相应次数的所述字向量矩阵与所述卷积核之间的乘法运算;
    对所述特征映射矩阵进行激活和池化,得到降维特征映射矩阵;
    基于所述降维特征映射矩阵进行意图识别,得到所述输入语句的分类标签矩阵,所述分类标签矩阵包含各分类标签的概率值;
    对所述分类标签矩阵中所有分类标签的概率值进行归一化,得到所述输入语句的意图识别结果。
  2. 根据权利要求1所述的自然语言处理方法,其特征在于,所述长度信息为布尔指针类型的掩码。
  3. 根据权利要求1所述的自然语言处理方法,其特征在于,在所述卷积计算中,根据所述输入语句的长度信息执行相应次数的所述字向量矩阵与所述卷积核之间的乘法运算,包括:
    以边缘不填充、卷积步长为1的方式,令所述卷积核在所述字向量矩阵的高度维度上滑行,以执行相应次数的所述字向量矩阵与所述卷积核之间的乘法运算,其中,所述卷积核的滑行行程等于所述输入语句的长度信息所指示的所述输入语句的字数量,所述相应次数等于所述输入语句的长度信息所指示的所述输入语句的字数量与所述卷积核的尺寸的差加上1得到的数值。
  4. 根据权利要求1所述的自然语言处理方法,其特征在于,在所述卷积计算中,根据所述输入语句的长度信息执行相应次数的所述字向量矩阵与所述卷积核之间的乘法运算,包括:
    以在所述输入语句的尾部进行边缘填充、卷积步长为1的方式,令所述卷积核在所述字向量矩阵的高度维度上滑行,以执行相应次数的所述字向量矩阵与所述卷积核之间的乘法运算,其中,所述卷积核在所述字向量矩阵的高度维度上的滑行行程等于所述输入语句的长度信息所指示的所述输入语句的字数量加上所述卷积核的尺寸减1;所述相应次数等于所述输入语句的长度信息所指示的所述输入语句的字数量。
  5. 根据权利要求1所述的自然语言处理方法,其特征在于,
    在同一个加速器上执行所述采用卷积核对所述字向量矩阵进行卷积计算,得到特征映射矩阵的步骤以及所述对所述特征映射矩阵进行激活和池化,得到降维特征映射矩阵的步骤。
  6. 根据权利要求1所述的自然语言处理方法,其特征在于,所述卷积核的数量为多个,且各所述卷积核的尺寸不同;所述特征映射矩阵的数量为多个,且多个所述特征映射矩阵与多个所述卷积核一一对应;所述降维特征映射矩阵的数量为多个,且多个所述降维特征映射矩阵与多个所述特征映射矩阵一一对应;
    所述采用卷积核对所述字向量矩阵进行卷积计算,得到特征映射矩阵,包括:
    采用不同尺寸的多个所述卷积核对所述字向量矩阵分别进行卷积计算,得到多个所述特征映射矩阵;
    对所述特征映射矩阵进行激活和池化,得到降维特征映射矩阵,包括:
    对多个所述特征映射矩阵分别进行激活和池化,得到多个所述降维特征映射矩阵;
    所述基于所述降维特征映射矩阵进行意图识别,得到所述输入语句的分类标签矩阵,包括:
    对多个所述降维特征映射矩阵进行累加合并,得到合并特征映射矩阵;
    基于所述合并特征映射矩阵进行意图识别,得到所述输入语句的分类标签矩阵。
  7. 根据权利要求1所述的自然语言处理方法,其特征在于,在采用卷积 核对所述字向量矩阵进行卷积计算,得到特征映射矩阵之前,还包括:
    对所述字向量矩阵增加一维通道维度,以将所述字向量矩阵的维度扩展为包括批次、高度、宽度和通道的四个维度;
    根据所述卷积计算的输入数据矩阵的各维度的长度的最大限值,对维度扩展后的所述字向量矩阵的各维度的长度进行变换,以使变换后的所述字向量矩阵的各维度的长度均不超过对应的最大限值,其中,变换后的所述字向量矩阵的各维度的长度的乘积等于变换前的所述字向量矩阵的各维度的长度的乘积。
  8. 根据权利要求1所述的自然语言处理方法,其特征在于,对所述特征映射矩阵进行激活,包括:
    利用relu激活函数对所述特征映射矩阵进行非线性映射。
  9. 根据权利要求1所述的自然语言处理方法,其特征在于,对所述特征映射矩阵进行池化,包括:
    对所述特征映射矩阵进行平均池化或最大池化。
  10. 一种电子设备,其特征在于,包括:
    处理器;以及
    存储有计算机程序代码的存储器;
    当所述计算机程序代码被所述处理器运行时,导致所述电子设备执行根据权利要求1-9中任一项所述的基于神经网络的自然语言处理方法。
  11. 一种运行指令的芯片,其特征在于,所述芯片包括存储器、处理器,所述存储器中存储代码和数据,所述存储器与所述处理器耦合,所述处理器运行所述存储器中的代码使得所述芯片用于执行权利要求1-9中任一项所述的基于神经网络的自然语言处理方法。
  12. 一种包含指令的程序产品,其特征在于,当所述程序产品在计算机上运行时,使得所述计算机执行权利要求1-9中任一项所述的基于神经网络的自然语言处理方法。
  13. 一种计算机程序,其特征在于,当所述计算机程序被处理器执行时,用于执行权利要求1-9中任一项所述的基于神经网络的自然语言处理方法。
PCT/CN2021/105268 2020-09-17 2021-07-08 一种基于神经网络的自然语言处理方法和电子设备 WO2022057406A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010982596.5 2020-09-17
CN202010982596.5A CN112069837A (zh) 2020-09-17 2020-09-17 一种基于神经网络的自然语言处理方法和电子设备

Publications (1)

Publication Number Publication Date
WO2022057406A1 true WO2022057406A1 (zh) 2022-03-24

Family

ID=73682014

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/105268 WO2022057406A1 (zh) 2020-09-17 2021-07-08 一种基于神经网络的自然语言处理方法和电子设备

Country Status (2)

Country Link
CN (1) CN112069837A (zh)
WO (1) WO2022057406A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112069837A (zh) * 2020-09-17 2020-12-11 湖北亿咖通科技有限公司 一种基于神经网络的自然语言处理方法和电子设备
CN114386425B (zh) * 2022-03-24 2022-06-10 天津思睿信息技术有限公司 用于对自然语言文本内容进行处理的大数据体系建立方法
CN117574136B (zh) * 2024-01-16 2024-05-10 浙江大学海南研究院 一种基于多元高斯函数空间变换的卷积神经网络计算方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107506350A (zh) * 2017-08-16 2017-12-22 京东方科技集团股份有限公司 一种识别信息的方法和设备
US20190050728A1 (en) * 2017-08-09 2019-02-14 Penta Security Systems Inc. Method and apparatus for machine learning
CN109684626A (zh) * 2018-11-16 2019-04-26 深思考人工智能机器人科技(北京)有限公司 语义识别方法、模型、存储介质和装置
CN110263139A (zh) * 2019-06-10 2019-09-20 湖北亿咖通科技有限公司 车辆、车机设备及其基于神经网络的文本意图识别方法
CN110569500A (zh) * 2019-07-23 2019-12-13 平安国际智慧城市科技股份有限公司 文本语义识别方法、装置、计算机设备和存储介质
CN112069837A (zh) * 2020-09-17 2020-12-11 湖北亿咖通科技有限公司 一种基于神经网络的自然语言处理方法和电子设备

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202044A (zh) * 2016-07-07 2016-12-07 武汉理工大学 一种基于深度神经网络的实体关系抽取方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190050728A1 (en) * 2017-08-09 2019-02-14 Penta Security Systems Inc. Method and apparatus for machine learning
CN107506350A (zh) * 2017-08-16 2017-12-22 京东方科技集团股份有限公司 一种识别信息的方法和设备
CN109684626A (zh) * 2018-11-16 2019-04-26 深思考人工智能机器人科技(北京)有限公司 语义识别方法、模型、存储介质和装置
CN110263139A (zh) * 2019-06-10 2019-09-20 湖北亿咖通科技有限公司 车辆、车机设备及其基于神经网络的文本意图识别方法
CN110569500A (zh) * 2019-07-23 2019-12-13 平安国际智慧城市科技股份有限公司 文本语义识别方法、装置、计算机设备和存储介质
CN112069837A (zh) * 2020-09-17 2020-12-11 湖北亿咖通科技有限公司 一种基于神经网络的自然语言处理方法和电子设备

Also Published As

Publication number Publication date
CN112069837A (zh) 2020-12-11

Similar Documents

Publication Publication Date Title
WO2022057406A1 (zh) 一种基于神经网络的自然语言处理方法和电子设备
CN112292816B (zh) 处理核心数据压缩和存储***
JP7020312B2 (ja) 画像特徴学習装置、画像特徴学習方法、画像特徴抽出装置、画像特徴抽出方法、及びプログラム
US20170061279A1 (en) Updating an artificial neural network using flexible fixed point representation
JP5235666B2 (ja) 選択されたセグメントのビット平面表現を用いた連想マトリックス法、システムおよびコンピュータプログラム製品
KR20190107033A (ko) 단어 벡터 처리 방법 및 장치
CN108874765B (zh) 词向量处理方法及装置
CN110799957A (zh) 具有元数据致动的条件图执行的处理核心
CN109918507B (zh) 一种基于TextCNN改进的文本分类方法
CN111782826A (zh) 知识图谱的信息处理方法、装置、设备及存储介质
CN113220865B (zh) 一种文本相似词汇检索方法、***、介质及电子设备
WO2023207059A1 (zh) 一种视觉问答任务处理方法、***、电子设备及存储介质
CN109583586B (zh) 一种语音识别或图像识别中的卷积核处理方法及装置
CN111240746A (zh) 一种浮点数据反量化及量化的方法和设备
CN107423269B (zh) 词向量处理方法及装置
CN112598129A (zh) 基于ReRAM神经网络加速器的可调硬件感知的剪枝和映射框架
CN114677548A (zh) 基于阻变存储器的神经网络图像分类***及方法
CN114781380A (zh) 一种融合多粒度信息的中文命名实体识别方法、设备和介质
JP7163515B2 (ja) ニューラルネットワークのトレーニング方法、ビデオ認識方法及び装置
CN109544651B (zh) 用于图像对比的数据压缩方法、图像对比方法及装置
CN111091001B (zh) 一种词语的词向量的生成方法、装置及设备
WO2021035598A1 (zh) 数据处理方法及设备
CN116187416A (zh) 一种基于层剪枝灵敏度的迭代式重训练方法及一种图像处理器
KR102605709B1 (ko) Nested 와 Overlapped Named Entity 인식을 위한 피라미드 Layered 어텐션 모델
CN115905546A (zh) 基于阻变存储器的图卷积网络文献识别装置与方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21868235

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21868235

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 29.06.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21868235

Country of ref document: EP

Kind code of ref document: A1