WO2020140403A1 - 文本分类方法、装置、计算机设备及存储介质 - Google Patents

文本分类方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2020140403A1
WO2020140403A1 PCT/CN2019/092531 CN2019092531W WO2020140403A1 WO 2020140403 A1 WO2020140403 A1 WO 2020140403A1 CN 2019092531 W CN2019092531 W CN 2019092531W WO 2020140403 A1 WO2020140403 A1 WO 2020140403A1
Authority
WO
WIPO (PCT)
Prior art keywords
word vector
word
attention
text classification
features
Prior art date
Application number
PCT/CN2019/092531
Other languages
English (en)
French (fr)
Inventor
金戈
徐亮
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020140403A1 publication Critical patent/WO2020140403A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present application relates to the technical field of text classification, and in particular, to a text classification method, device, computer device, and computer-readable storage medium.
  • the traditional text classification model based on convolutional neural network which is TextCNN
  • English is Text Convolutional Neural Network
  • the corpus is processed layer by layer to classify the text. But sometimes the corpus data for text classification is relatively large, which makes TextCNN model less efficient for text classification.
  • Embodiments of the present application provide a text classification method, device, computer device, and computer-readable storage medium, which can solve the problem of low text classification efficiency in the conventional technology.
  • an embodiment of the present application provides a text classification method, the method includes: obtaining a corpus for text classification, and segmenting the corpus in a preset manner to obtain a Chinese word segmentation; performing the Chinese word segmentation Word embedding to convert the Chinese word segmentation into a word vector; using a convolutional neural network combined with an attention function to perform feature extraction on the word vector to obtain the word vector features of the word vector; connecting the words through a fully connected way Vector features to get output data; classify the output data by a classifier to get text classification results.
  • an embodiment of the present application further provides a text classification apparatus, wherein the apparatus includes: an acquisition unit for acquiring a corpus for text classification, and segmenting the corpus in a preset manner to obtain Chinese Word segmentation; conversion unit, used to embed the Chinese word segmentation to convert the Chinese word segmentation into a word vector; extraction unit, used to extract features of the word vector using a convolutional neural network combined with an attention function to obtain Word vector features of the word vector; a connection unit for connecting the word vector features in a fully connected manner to obtain output data; a classification unit for classifying the output data via a classifier to obtain a text classification result .
  • an embodiment of the present application further provides a computer device, which includes a memory and a processor, a computer program is stored on the memory, and the text classification method is implemented when the processor executes the computer program.
  • an embodiment of the present application further provides a computer-readable storage medium that stores a computer program, and when the computer program is executed by a processor, causes the processor to perform the text classification method.
  • FIG. 1 is a schematic diagram of an application scenario of a text classification method provided by an embodiment of this application.
  • FIG. 2 is a schematic flowchart of a text classification method provided by an embodiment of this application.
  • FIG. 3 is a schematic diagram of word vectors in a text classification method provided by an embodiment of this application.
  • FIG. 5 is a schematic diagram of a model corresponding to the text classification method in FIG. 4;
  • FIG. 6 is a schematic diagram of another model corresponding to the text classification method provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a third model corresponding to a text classification method provided by an embodiment of this application.
  • FIG. 8 is a schematic block diagram of a text classification device provided by an embodiment of this application.
  • FIG. 9 is another schematic block diagram of a text classification device provided by an embodiment of this application.
  • FIG. 10 is a schematic block diagram of a computer device provided by an embodiment of the present application.
  • FIG. 1 is a schematic diagram of an application scenario of a text classification method provided by an embodiment of the present application.
  • the application scenarios include:
  • the application program is installed on the terminal shown in FIG. 1, and the R&D personnel implement the steps of performing the text classification method through the terminal.
  • the terminal may be an electronic device such as a notebook computer, tablet computer, or desktop computer.
  • the terminal application environment shown in FIG. 1 is also It can be replaced with computer equipment such as servers.
  • the server may be a server cluster or a cloud server.
  • the server cluster can use a distributed system, and the server of the distributed system can include a master server and a slave server, so that the master server uses the obtained corpus to perform the steps of the text classification method.
  • each subject in FIG. 1 The working process of each subject in FIG. 1 is as follows: the terminal obtains a corpus for text classification, and performs word segmentation in a preset manner to obtain a Chinese word segmentation, and embeds the Chinese word segmentation to convert the Chinese word segmentation into Word vector, using convolutional neural network combined with attention function to perform feature extraction on the word vector to obtain the word vector features of the word vector, and connect the word vector features in a fully connected manner to obtain output data; after classifier Classify the output data to obtain a text classification result.
  • FIG. 1 only illustrates a desktop computer as a terminal.
  • the type of the terminal is not limited to that shown in FIG. 1.
  • the terminal may also be an electronic device such as a mobile phone, notebook computer, or tablet computer.
  • the application scenario of the above text classification method is only used to explain the technical solution of the present application, and is not used to limit the technical solution of the present application.
  • FIG. 2 is a schematic flowchart of a text classification method provided by an embodiment of the present application.
  • the text classification method is applied to the terminal in FIG. 1 to complete all or part of the functions of the text classification method.
  • FIG. 2 is a schematic flowchart of a text classification method provided by an embodiment of the present application. As shown in FIG. 2, the method includes the following steps S210-S250:
  • S210 Obtain a corpus for text classification, and segment the corpus in a preset manner to obtain a Chinese segmentation.
  • the text classification in the embodiment of the present application refers to the classification of text based on a convolutional neural network classification model.
  • Text classification model based on convolutional neural network English is Text Convolutional Neural Network, abbreviated as TextCNN, called TextCNN network structure or TextCNN network model
  • TextCNN is a convolutional neural network used for text classification, that is, the use of convolutional neural network The network classifies the text.
  • Word segmentation refers to the segmentation of Chinese text.
  • English is Chinese Word Segmentation. It refers to the segmentation of a Chinese character sequence into individual words. Segmentation is the process of recombining consecutive character sequences into word sequences according to certain specifications. To classify Chinese text, you need to segment the Chinese text first.
  • Chinese word segmentation there are many open source Chinese word segmentation tools, such as the most commonly used Jieba word segmentation, word segmentation and Pangu segmentation. Word segmentation also includes further processing, removing some high-frequency words and low-frequency words, and removing some meaningless symbols.
  • the text to be classified needs to be pre-processed to segment the obtained corpus to be classified, thereby obtaining Chinese word segmentation, and further converting the obtained Chinese word segmentation into a word vector.
  • the terminal obtains a corpus for text classification, and performs word segmentation in a preset manner to obtain a Chinese word segmentation.
  • the corpus may be a preset corpus on a designated website on the web, and the crawling rules may be based on Actually, it needs to be set in advance.
  • the crawling rule is a corpus of a certain web page, or it can be a corpus of a subject crawled.
  • the corpus may also be a corpus provided through a corpus database, such as user data accumulated by a website.
  • the application scenario of the embodiment of the present application is text classification, for example, text may also include news headline classification, model input is text word vector information, and output is text classification results.
  • the word embedding is "Embedding"
  • Embedding is a type of word representation, words with similar meaning have similar representation, is a method of mapping vocabulary to real number vectors, the structure layer where the word embedding is called the word embedding layer , Or simply referred to as the embedding layer, the English is Embedding layer.
  • Word embedding is a type of technology, which means that a single word is represented as a real vector in a predefined vector space, and each word is mapped to a vector. Please refer to FIG. 3, which is a schematic diagram of word vectors in a text classification method provided by an embodiment of the present application.
  • the angle between two vectors can be calculated as follows:
  • the text corpus is obtained by word segmentation to obtain a Chinese word segmentation and then converted into a pre-trained word vector, that is, the input natural language is segmented and encoded into a word vector, which is prepared for the pre-trained word vector.
  • a pre-trained word vector that is, the input natural language is segmented and encoded into a word vector, which is prepared for the pre-trained word vector.
  • you can use pre-trained word vectors or you can directly train a set of word vectors during the training of TextCNN, but using pre-trained word vectors is better than training a set of word vectors during the training of TextCNN. More than 100 times faster.
  • pre-trained word vectors it is divided into Static method and No-static method.
  • Static method refers to the parameter of the word vector is no longer adjusted during the training of TextCNN.
  • No-static method adjusts the parameter of the word vector during the training process , So the result of No-static method is better than that of Static
  • the embedding layer is adjusted according to the Chinese word segmentation of a preset number of batches, where the batch is in Batch English. That is, it is not necessary to adjust the Embedding layer (embedding layer) in each Batch, but to adjust the embedding layer once for every 100 Chinese word segmentation of the Batch, which can reduce the training time and fine-tune the word vector.
  • a preset word vector dictionary to embed the Chinese word segmentation to convert the Chinese word segmentation into a word vector, that is, to use a trained preset word vector dictionary to embed the Chinese word segmentation to word embedding Converting the Chinese word segmentation into a word vector, compared to directly embedding the Chinese word segmentation, can improve the efficiency of converting the Chinese word segmentation into a word vector. That is, the trained preset word vector dictionary can be used to embed the corpus corresponding to the Chinese word vector into words to convert the corpus into word vectors.
  • the word vector may use Word2Vec pre-trained word vectors, that is, each vocabulary has a corresponding vector representation.
  • Such vector representations can express vocabulary information in data form, and the word vector dimension can be 300.
  • Word2vec English is Word to vector
  • Word to vector is a software tool for training word vectors, used to generate related models of word vectors
  • automatic training of word vectors can be achieved through the Gensim library in Python.
  • Convolutional Neural Networks English is Convolutional Neural Networks, referred to as CNN, is a type of feedforward neural networks (Feedforward Neural Networks) that contains convolution or related calculations and has a deep structure, is a representative of deep learning (Deep Learning) One of the algorithms. Since the convolutional neural network can perform translation-invariant classification (English is Shift-Invariant Classification), it is also called “translation-invariant artificial neural network (English is Shift-Invariant Artificial Neural Networks, referred to as SIANN).
  • SIANN Shift-Invariant Artificial Neural Networks
  • Attention also known as attention structure or attention mechanism, is AttentionMechanism in English.
  • Attention in convolutional neural networks is mainly used to determine which part of the input needs to be paid attention to by the convolutional neural network and allocate limited information processing resources to important The part of is focused on the convolutional neural network to process the input data and improve the efficiency of the convolutional neural network to process the input data.
  • the terminal adds an attention structure to the TextCNN model, and improves the training of the TextCNN model for data, and improves the training efficiency and text classification efficiency of the TextCNN model, thereby implementing the use of convolutional neural network combined with attention function to the
  • the word vector performs feature extraction to obtain the word vector features of the word vector.
  • two attention mechanisms are added to the TextCNN model.
  • One is the word-wise attention structure, that is, the word attention structure.
  • the Word-wise attention structure acts on the word embedding layer.
  • the training receives corresponding attention Force value, add attention to the word vector output by the word embedding layer to improve the relevance of the convolutional neural network to the word vector processing, one type is the Filter-wise attention structure, that is, the channel attention structure, Filter-wise attention After the structure acts on the convolutional layer, according to the output of the convolution channel, the training obtains the corresponding attention value, so that when the pooled output data is fully connected, the relevance of the fully connected object during the full connection process is improved. Improve the pertinence in the text classification process to improve the efficiency of the entire text classification.
  • full connection refers to connecting all the features and sending the output data to the classifier, such as the Softmax classifier.
  • full connection refers to the connection between the neurons in the output layer and each neuron in the input layer. That is, each node of the fully connected layer in the convolutional neural network is connected to all the nodes of the previous layer, which is used to synthesize the features extracted from the previous side.
  • Fully connected layer Fully connected in English, is a layer in the TextCNN model.
  • one or more fully connected layers are generally connected.
  • Each neuron is fully connected with all the neurons in the previous layer.
  • the fully connected layer can integrate local information with class distinction in the convolutional layer or the pooling layer, and then the fully connected layer will undergo multiple convolutions in front.
  • the highly abstracted features after pooling are integrated, and then normalized to output a probability for the classification of various features, and sent to the classifier after the fully connected layer, so that the classifier (Classifier) can be based on the full
  • Classifier classifier
  • multiple fully connected layers can be spliced.
  • the classifier in the TextCNN model on the terminal receives the fully connected output data sent by the fully connected layer, and classifies the output data by the classifier to finally obtain the text classification result, where the classifier can Using softmax classifier.
  • a corpus for text classification is obtained, and the corpus is segmented in a preset manner to obtain a Chinese segmentation, and the Chinese segmentation is word-embedded to convert the Chinese segmentation into Word vector, using convolutional neural network combined with attention function to perform feature extraction on the word vector to obtain the word vector features of the word vector, and connect the word vector features in a fully connected manner to obtain output data; after classifier Classify the output data to obtain text classification results, so that attention is added to the text classification model based on the convolutional neural network to focus on the pertinence of text processing, which can effectively improve the training efficiency and text classification of the text classification model effectiveness.
  • FIG. 4 is another schematic flowchart of the text classification method provided by the embodiment of the present application.
  • FIG. 5 is a schematic diagram of a model corresponding to the text classification method in FIG. 4. As shown in FIGS. 4 and 5, the method includes the following steps S410-S490:
  • S410 Obtain a corpus for text classification, and segment the corpus in a preset manner to obtain a Chinese segmentation
  • step S410 is the same as step S210 in FIG. 2
  • step S420 is the same as step S220 in FIG. 2.
  • Both steps S210 and S420 in FIG. 2 are included by way of application, and will not be repeated here.
  • the input layer in FIG. 5 is used to obtain text corpus for classification, such as classifying news headlines.
  • the text classification model shown in FIG. 5 inputs text word vector information and outputs text classification results.
  • the word embedding layer in FIG. 5 is used to embed the input natural language corpus to encode the natural language corpus into a word vector.
  • attention also known as attention mechanism, or attention model, or attention structure
  • English Attention Model.
  • the attention model in natural language processing draws on the concept of human attention.
  • visual attention is a unique brain signal processing mechanism of human vision. Human vision quickly scans the global image to obtain the target area that needs focus. , Also known as the focus of attention, and then invest more attention resources in this area to obtain more detailed information of the target that needs attention, while suppressing other useless information, human visual attention greatly improves the visual
  • the attention in the embodiments of the present application is essentially similar to the selective visual attention of human beings, and the core goal is to select information that is more critical to the current task goal from among many kinds of information.
  • a first attention function is added on the basis of the convolutional neural network, and the first attention function is used to assign attention weight to the word vector to obtain an adjusted word vector.
  • the first attention function is mainly used In order to assign weights to word vectors, attention weights are assigned to each word vector to highlight the word vectors that need attention. Since the first attention function is used to distribute the weight of the word vector, the first attention function can also be called word attention, which is called Word-wise in English, and can also be called Word-wise attention or Word-wise mechanism.
  • Word-wise attention is used to extract word attention information, that is, to assign attention weights according to the input vocabulary, and to perform automatic learning process, determine the attention weight of words in automatic learning, and output to the next operation.
  • the layer where the Word-wise attention is located is called the Word-wise attention structure layer, or the Word-wise attention layer, or the Word-wise attention structure layer.
  • the Word-wise attention layer is used to assign attention according to the input vocabulary Force weights, and output the word vectors after the weights are assigned to the next operation.
  • the Word-wise attention structure layer is added between the word embedding layer and the convolution layer, and the input of the Word-wise attention structure layer comes from the word embedding layer, Word- The wise attention structure layer can build two fully connected layers and output the attention weights with the Softmax function. This output is used to adjust the output of the word embedding layer, and the data output from the word embedding layer with the adjusted attention weights is input to the convolution.
  • Floor For example, for the description of "I like to eat apples", different weights are assigned to "me”, “like”, and “apple” through self-learning of the neural network. How to adjust the weights is also the self-learning of the neural network during the training process. Adaptation and self-learning to achieve dynamic adjustment. The establishment of convolutional neural network and attention structure is achieved through the Tensorflow library in Python.
  • a Word-wise attention structure layer can be added in front of the first layer convolution layer to input the refined information of the Word-wise structure layer to the convolution layer.
  • Word-wise attention is based on the word vector input of the word embedding layer, by establishing one or two fully connected hidden layers, and outputting them through the Softmax function.
  • the output of the word attention structure layer is attention weight, and the word vector output from the word embedding layer needs to be dot-multiplied with the output of the attention structure layer to complete the adjustment of the attention weight of the word vector and adjust The output of the following word vector is input to the convolutional layer to complete subsequent operations.
  • the Softmax function or normalized exponential function, is a generalization of the logic function.
  • Dot product also called inner product and quantity product of vector
  • the convolutional layer Convolutional in English, mainly uses a sampler to collect key data content from the input data.
  • the biggest feature of the convolutional layer is local perception and weight sharing, so as to realize the extraction of different features of text through convolution.
  • convolutional layers, pooling layers and fully connected layers all belong to convolutional neural networks and are three different types of hidden layers.
  • the convolutional layer may include multiple layers of convolutional kernels .
  • the convolutional layer contains 128 channels of convolution kernels with heights of 1, 3, and 5, that is, 128 channels of convolution kernels with heights of 1, 3, and 5 rows.
  • the output of the convolutional layer will be input to the subsequent activation layer and pooling layer.
  • the convolutional layer mainly uses convolution to extract different N-gram features.
  • the input sentence or text will be transformed into a two-dimensional matrix after passing through the word embedding layer. Assuming the length of the text is
  • the size of the convolution kernel is generally set to n*
  • n can have multiple choices, such as 2, 3, 4, 5, etc.
  • if the size of the kernel of the convolution kernel is selected to be 2*
  • multiple different types of kernels need to be used at the same time, and there can be multiple kernels of each size. If the size of the kernel we use is 2, 3, 4, 5*
  • N-Gram is an algorithm based on statistical language model. Its basic idea is to perform a sliding window operation of size N in bytes of the content in the text to form a sequence of byte fragments of length N, each byte The fragment is called a gram, which counts the occurrence frequency of all grams, and filters according to the preset threshold to form a key gram list, which is the vector feature space of this text. Each gram in the list is a feature Vector dimension.
  • the loss function of the convolutional neural network is cross-entropy
  • the training method is ADAM
  • the learning rate is 0.001.
  • ADAM English is Adaptive Moment Estimation, which is an adaptive moment estimation.
  • the learning rate English is Learing Rate, also known as the learning rate, which is used to control the learning progress of the model.
  • Neural network training is achieved through the Tensorflow library in Python.
  • the activation function is used to add nonlinear factors.
  • commonly used activation functions include: Sigmoid function, Tanh function and ReLU function.
  • Sigmoid function Sigmoid function
  • Tanh function Sigmoid function
  • ReLU function a layer that uses the activation function to add nonlinear factors.
  • the image is mainly processed by convolution, that is, each pixel is given a weight, and this operation is linear. But for the training samples used, it is not necessarily linearly separable. To solve this problem, linear changes or the introduction of nonlinear factors can be used to solve problems that cannot be solved by the linear model. Therefore, the activation function is used in the TextCNN model.
  • the word vector feature is activated, that is, the activation function is used to adjust the word vector feature of the adjusted word vector to add a nonlinear factor to the word vector feature extracted by the convolution layer, thereby improving the accuracy of word vector processing Sex.
  • pooling which is Pooling in English, refers to the use of convolutional neural networks to compress the input features to extract the main features and reduce the amount of data. Pooling operations are usually used in convolutional neural networks.
  • the layer used for pooling is called the pooling layer in the TextCNN model, and the English name is the Pooling layer.
  • the pooling layer is used to reduce the feature vectors output by the convolutional layer, while improving the results to effectively control overfitting.
  • the pooling layer is often behind the convolutional layer, and the feature vector output by the convolutional layer is reduced by pooling, and the result is improved to effectively control overfitting.
  • the most common pooling operations are Mean Pooling and Max Pooling.
  • the form of the pooling layer can be Max Pooling, which is the maximum pooling layer.
  • the maximum pooling is to take the point with the largest value in the local receiving domain. Max Pooling can reduce the deviation of the estimated mean caused by the parameter error of the convolution layer. Keep more texture information.
  • a maximum pooling layer selects the maximum value from a block of features.
  • the pooling layer is also parameterized by the window (block) size and step size, for example, sliding a 2 ⁇ 2 window in 2 steps on a 10 ⁇ 10 feature matrix, and then selecting each The maximum value of the four values in each window is a 5 ⁇ 5 feature matrix.
  • the pooling layer reduces the dimension of representation by retaining only the most prominent information.
  • the output of the pooling layer will be connected to the fully connected layer to achieve global feature extraction, and the final output will be completed with the Softmax function.
  • the second attention function is used to adjust the output of the pooling layer of each channel, assign attention weights to each channel, and input the data output from the pooling layer with the adjusted weights to the fully connected layer for subsequent calculation. Since the second attention function is used to adjust the output of the pooling layer of each channel, it is also called channel attention, which is called Filter-wise in English, and can also be called Filter-wise attention, or Filter-wise attention structure layer.
  • the layer that uses channel attention to adjust the output of the pooling layer of each channel is called the channel attention layer in the TextCNN model, and it can also be called the filter-wise attention layer.
  • the Filter-wise attention layer is used for the output of the convolution channel, and the corresponding attention value is obtained by training.
  • the Filter-wise attention layer After the Filter-wise attention layer is added and the convolution layer, it can be added between the pooling layer and the fully connected layer.
  • the output of this part It comes from the output of the pooling layer, and can output the attention weight with the Softmax function by establishing two fully connected layers.
  • Convolutional neural networks and the establishment of attention can be achieved through the Tensorflow library in Python.
  • the Filter-wise mechanism has a similar structure to the Word-wise mechanism, but acts on different parts of the model, assigns weights to different objects, and plays different roles.
  • the Filter-wise mechanism refines channel attention information. Since the convolutional neural network contains multiple channels, the Filter-wise mechanism can significantly improve model training efficiency.
  • the Filter-wise mechanism can be added to the final output of the channel, the attention weight is calculated through the fully connected hidden layer and the Softmax function, and the calculation result and the channel output are dot-multiplied to obtain the channel output after the weight adjustment.
  • the second attention function is used here to assign attention weight to the pooled word vector feature to obtain the first word vector feature, and then perform subsequent calculations.
  • the first word vector features are connected in a fully connected manner to obtain output data.
  • the TextCNN model that is, the highly abstracted features after multiple convolutions are integrated through the fully connected layer, and then can be integrated Perform normalization and output a probability for various classification situations.
  • the subsequent classifier can be classified according to the probability obtained by the full connection.
  • the classifier can be a Classifier classifier. For example, Fully-connected layer is stitched with another layer after Max-pooling layer, and the output of this layer is used as the output result. In practice, in order to improve the learning ability of the network, multiple fully connected layers can be spliced.
  • S490 Classify the output data via a classifier to obtain a text classification result.
  • step S490 is the same as step S250 in FIG. 2, and step S250 in FIG. 2 is included here by way of application, and will not be repeated here.
  • the output data is classified by a classifier to obtain a text classification result, and the text classification result is output through an output layer, and the output layer is used to output the text classification result.
  • Word-wise attention acts on the word vector output from the word embedding layer.
  • the corresponding attention value of the word vector is obtained by training.
  • Filter-wise attention is applied to the output of the convolutional layer.
  • the output data of the convolution channel the corresponding attention value of the output data is trained, Word-wise attention and Filter-wise attention and other parts of the TextCNN model Train together without additional training calculations.
  • the cross entropy of TextCNN classification results is used as the loss function
  • ADAM is used as the optimization method for training.
  • the embodiment of the present application can effectively improve the training efficiency of the TextCNN model by adding word attention for words and channel attention for channels in the TextCNN model.
  • the TextCNN model of the embodiment of the present application is applied to text classification, such as news headline classification.
  • TextCNN model input is text word vector information and output is text classification results.
  • Word-wise and Filter-wise mechanisms can improve TextCNN model training. Efficiency, especially the Filter-wise mechanism has a significant effect.
  • the step of using a convolutional neural network combined with an attention function to perform feature extraction on the word vector to obtain word vector features of the word vector includes:
  • a second attention function is used to assign attention weight to the second word vector feature to obtain the word vector feature of the word vector.
  • FIG. 6 is a schematic diagram of a model corresponding to the text classification method provided in this embodiment.
  • the corpus for text classification is obtained through the input layer, and the corpus is segmented in a preset manner to obtain a Chinese segmentation, and the Chinese segmentation is performed through a word embedding layer Word embedding to convert the Chinese word segmentation into a word vector, using a convolutional neural network to perform feature extraction on the word vector through a convolutional layer to obtain a second word vector feature of the word vector, and a filter-wise attention structure
  • the layer uses a second attention function to assign attention weight to the second word vector feature to obtain the word vector feature of the word vector, and then uses a fully connected layer to connect the word vector feature to obtain the output data
  • the output data is classified by the classifier through the output layer to obtain the text classification result.
  • the second attention function is used to adjust the output of the pooling layer of each channel, so it is also called channel attention, which is Filter-wise in English, and can also be called Filter-wise attention.
  • the layer that uses channel attention to adjust the output of the pooling layer of each channel is called the channel attention layer in the TextCNN model, which can also be called the filter-wise attention layer, or the filter-wise attention structure layer.
  • the method before the step of using the second attention function to assign attention weight to the second word vector feature to obtain the word vector feature of the word vector, the method further includes:
  • FIG. 7 is a schematic diagram of a model corresponding to the text classification method provided in this embodiment.
  • the corpus for text classification is obtained through the input layer, and the corpus is segmented in a preset manner to obtain a Chinese segmentation, and the Chinese segmentation is performed through a word embedding layer Word embedding to convert the Chinese word segmentation into a word vector, using a convolutional neural network to perform feature extraction on the word vector through a convolutional layer to obtain a second word vector feature of the word vector, and using an activation function on the activation layer
  • the second word vector feature output by the convolution layer is adjusted by adding nonlinearity to improve the accuracy of word vector processing, and then the non-linearly adjusted second word feature vector output by the activation layer is reduced by the pooling layer, and then passed
  • the filter-wise attention structure layer uses a second attention function to assign attention weights to the second word vector features to obtain the word vector features of the word vector, and then uses
  • the second attention function is used to adjust the output of the pooling layer of each channel, so it is also called channel attention.
  • channel attention In English, it is Filter-wise, and it can also be called Filter-wise attention.
  • the layer that uses channel attention to adjust the output of the pooling layer of each channel is called the channel attention layer in the TextCNN model, which can also be called the filter-wise attention layer, or the filter-wise attention structure layer.
  • FIG. 8 is a schematic block diagram of a text classification apparatus provided by an embodiment of the present application.
  • an embodiment of the present application further provides a text classification device.
  • the text classification device includes a unit for performing the above text classification method, and the device may be configured in a computer device such as a terminal or a server.
  • the text classification device 800 includes an acquisition unit 801, a conversion unit 802, an extraction unit 803, a connection unit 804, and a classification unit 805.
  • the obtaining unit 801 is configured to obtain a corpus for text classification, and segment the corpus in a preset manner to obtain a Chinese word segmentation;
  • a conversion unit 802 configured to embed the Chinese word segmentation to convert the Chinese word segmentation into a word vector
  • An extraction unit 803 is used to perform feature extraction on the word vector using a convolutional neural network combined with an attention function to obtain word vector features of the word vector;
  • a connecting unit 804 configured to connect the word vector features in a fully connected manner to obtain output data
  • a classification unit 805 is used to classify the output data via a classifier to obtain a text classification result.
  • FIG. 9 is another schematic block diagram of a text classification device provided by an embodiment of the present application.
  • the extraction unit 803 includes:
  • a first allocation subunit 8031 configured to use the first attention function to assign attention weight to the word vector to obtain an adjusted word vector
  • An extraction subunit 8032 configured to perform feature extraction on the adjusted word vector using a convolutional neural network to obtain word vector features
  • An activation subunit 8033 configured to activate the word vector feature using an activation function
  • the pooling subunit 8034 is used to pool the activated word vector features.
  • a second allocation subunit 8035 configured to use the second attention function to assign attention weights to the word vector features to obtain the first word vector features
  • the connecting unit 804 is configured to connect the first word vector features in a fully connected manner to obtain output data.
  • the extraction unit 803 includes:
  • An extraction subunit 8032 configured to perform feature extraction on the word vector using a convolutional neural network to obtain the second word vector feature of the word vector;
  • An activation subunit 8033 configured to activate the second word vector feature using an activation function
  • the pooling subunit 8034 is configured to pool the activated second word vector feature.
  • the second allocation subunit 8035 is configured to use the second attention function to assign attention weight to the second word vector feature to obtain the word vector feature of the word vector.
  • each unit in the above text classification device is only for illustration. In other embodiments, the text classification device may be divided into different units as needed, or each unit in the text classification device may be taken differently. The order and method of connection is to complete all or part of the functions of the above text classification device.
  • the above text classification apparatus may be implemented in the form of a computer program, and the computer program may run on the computer device shown in FIG. 10.
  • FIG. 10 is a schematic block diagram of a computer device according to an embodiment of the present application.
  • the computer device 1000 may be a computer device such as a desktop computer or a server, or may be a component or part in other devices.
  • the computer device 1000 includes a processor 1002, a memory, and a network interface 1005 connected through a system bus 1001, where the memory may include a non-volatile storage medium 1003 and an internal memory 1004.
  • the non-volatile storage medium 1003 can store an operating system 10031 and a computer program 10032.
  • the computer program 10032 When executed, it may cause the processor 1002 to execute one of the above text classification methods.
  • the processor 1002 is used to provide computing and control capabilities to support the operation of the entire computer device 1000.
  • the internal memory 1004 provides an environment for running the computer program 10032 in the non-volatile storage medium 1003.
  • the processor 1002 can cause the processor 1002 to perform one of the above text classification methods.
  • the network interface 1005 is used for network communication with other devices.
  • the structure shown in FIG. 10 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device 1000 to which the solution of the present application is applied.
  • the specific computer device 1000 may include more or fewer components than shown in the figure, or combine certain components, or have a different arrangement of components.
  • the computer device may only include a memory and a processor. In such an embodiment, the structures and functions of the memory and the processor are consistent with the embodiment shown in FIG. 10, and details are not described herein again.
  • the processor 1002 is used to run the computer program 10032 stored in the memory to implement the text classification method of the embodiment of the present application.
  • the processor 1002 may be a central processing unit (Central Processing Unit, CPU), and the processor 1002 may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), Application specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may be any conventional processor.
  • the embodiments of the present application also provide a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program that, when executed by the processor, causes the processor to perform the steps of the text classification method described in the above embodiments.
  • the computer-readable storage medium may be various computer-readable storage media that can store computer programs, such as a U disk, a mobile hard disk, a read-only memory (Read-Only Memory, ROM), a magnetic disk, or an optical disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

一种文本分类方法、装置、计算机设备及计算机可读存储介质,属于文本分类技术领域,该方法获取进行文本分类的语料,并将所述语料通过预设方式进行分词以得到中文分词(S210),将所述中文分词进行词嵌入以将所述中文分词转化为词向量(S220),使用卷积神经网络结合注意力函数对所述词向量进行特征提取以得到所述词向量的词向量特征(S230),通过全连接的方式连接所述词向量特征以得到输出数据(S240),经分类器对所述输出数据进行分类以得到文本分类结果(S250)。

Description

文本分类方法、装置、计算机设备及存储介质
本申请要求于2019年1月4日提交中国专利局、申请号为201910007705.9、申请名称为“文本分类方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及文本分类技术领域,尤其涉及一种文本分类方法、装置、计算机设备及计算机可读存储介质。
背景技术
传统基于卷积神经网络的文本分类模型,也就是TextCNN,英文为Text Convolutional Neural Network,一般包括输入层、词嵌入层、卷积层、池化层、连接层及输出层,通过各层对文本语料进行逐层处理以实现对文本分类。但有时由于进行文本分类的语料数据比较大而使得TextCNN模型进行文本分类效率较低。
发明内容
本申请实施例提供了一种文本分类方法、装置、计算机设备及计算机可读存储介质,能够解决传统技术中文本分类效率比较低的问题。
第一方面,本申请实施例提供了一种文本分类方法,所述方法包括:获取进行文本分类的语料,并将所述语料通过预设方式进行分词以得到中文分词;将所述中文分词进行词嵌入以将所述中文分词转化为词向量;使用卷积神经网络结合注意力函数对所述词向量进行特征提取以得到所述词向量的词向量特征;通过全连接的方式连接所述词向量特征以得到输出数据;经分类器对所述输出数据进行分类以得到文本分类结果。
第二方面,本申请实施例还提供了一种文本分类装置,其中,所述装置包括:获取单元,用于获取进行文本分类的语料,并将所述语料通过预设方式进 行分词以得到中文分词;转化单元,用于将所述中文分词进行词嵌入以将所述中文分词转化为词向量;提取单元,用于使用卷积神经网络结合注意力函数对所述词向量进行特征提取以得到所述词向量的词向量特征;连接单元,用于通过全连接的方式连接所述词向量特征以得到输出数据;分类单元,用于经分类器对所述输出数据进行分类以得到文本分类结果。
第三方面,本申请实施例还提供了一种计算机设备,其包括存储器及处理器,所述存储器上存储有计算机程序,所述处理器执行所述计算机程序时实现所述文本分类方法。
第四方面,本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时使所述处理器执行所述文本分类方法。
附图说明
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的文本分类方法的应用场景示意图;
图2为本申请实施例提供的文本分类方法的流程示意图;
图3为本申请实施例提供的文本分类方法中词向量示意图;
图4为本申请实施例提供的文本分类方法的另一个流程示意图;
图5为图4中的文本分类方法对应的模型示意图;
图6为本申请实施例提供的文本分类方法对应的另一个模型示意图;
图7为本申请实施例提供的文本分类方法对应的第三个模型示意图;
图8为本申请实施例提供的文本分类装置的示意性框图;
图9为本申请实施例提供的文本分类装置的另一个示意性框图;以及
图10为本申请实施例提供的计算机设备的示意性框图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
请参阅图1,图1为本申请实施例提供的文本分类方法的应用场景示意图。所述应用场景包括:
(1)终端。图1所示终端上安装有应用程序,研发人员通过终端实现执行文本分类方法的步骤,所述终端可以为笔记本电脑、平板电脑或者台式电脑等电子设备,图1中所示的终端应用环境也可以更换为服务器等计算机设备。若图1中的应用环境为服务器,服务器可以为服务器集群或者云服务器。服务器集群又可以采用分布式***,分布式***的服务器又可以包括主服务器和从服务器,以使主服务器使用获得的语料执行文本分类方法的步骤。
图1中的各个主体工作过程如下:终端获取进行文本分类的语料,并将所述语料通过预设方式进行分词以得到中文分词,将所述中文分词进行词嵌入以将所述中文分词转化为词向量,使用卷积神经网络结合注意力函数对所述词向量进行特征提取以得到所述词向量的词向量特征,通过全连接的方式连接所述词向量特征以得到输出数据;经分类器对所述输出数据进行分类以得到文本分类结果。
需要说明的是,图1中仅仅示意出台式电脑作为终端,在实际操作过程中,终端的类型不限于图1中所示,所述终端还可以为手机、笔记本电脑或者平板电脑等电子设备,上述文本分类方法的应用场景仅仅用于说明本申请技术方案,并不用于限定本申请技术方案。
图2为本申请实施例提供的文本分类方法的示意性流程图。该文本分类方法应用于图1中的终端中以完成文本分类方法的全部或者部分功能。
请参阅图2,图2是本申请实施例提供的文本分类方法的流程示意图。如图2所示,该方法包括以下步骤S210-S250:
S210、获取进行文本分类的语料,并将所述语料通过预设方式进行分词以 得到中文分词。
其中,本申请实施例中的文本分类是指基于卷积神经网络分类模型对文本进行的分类。基于卷积神经网络的文本分类模型,英文为Text Convolutional Neural Network,简写为TextCNN,称为TextCNN网络结构或者TextCNN网络模型,TextCNN是用来做文本分类的卷积神经网络,也就是利用卷积神经网络对文本进行分类。
分词,是指对中文文本进行分词,英文为Chinese Word Segmentation,指的是将一个汉字序列切分成一个个单独的词,分词就是将连续的字序列按照一定的规范重新组合成词序列的过程。对中文文本分类需要先对中文文本分词,对中文文本分词,有很多开源的中文分词工具,例如最常用的Jieba分词,还有word分词及盘古分词等。分词还包括做进一步的处理,去除掉一些高频词汇和低频词汇,去掉一些无意义的符号等。具体地,在使用TextCNN模型将文本进行分类前,需要将待分类文本进行预处理以将获得的待分类语料进行分词,从而获得中文分词,将获得的中文分词进一步转化为词向量。
具体地,终端获取进行文本分类的语料,并将所述语料通过预设方式进行分词以得到中文分词,所述语料可以是通过爬取网络上指定网站上的预设语料,爬取规则可以根据实际需要预先设置,比如,爬取规则为某一网页的语料,也可以是爬取的某一主体的相关语料。所述语料还可以是通过语料数据库提供的语料,比如某一网站积累的用户数据等。本申请实施例的应用场景为文本分类,比如文本还可以包括新闻标题分类,模型输入为文本词向量信息,输出为文本分类结果。
S220、将所述中文分词进行词嵌入以将所述中文分词转化为词向量。
其中,词嵌入,英文为Word Embedding,是一种词的类型表示,具有相似意义的词具有相似的表示,是将词汇映射到实数向量的方法总称,词嵌入所在的结构层称为词嵌入层,或者简称为嵌入层,英文为Embedding layer。词嵌入是一类技术,是指单个词在预定义的向量空间中被表示为实数向量,每个单词都映射到一个向量。请参阅图3,图3为本申请实施例提供的文本分类方法中词向量示意图。如图3所示,假如在一个文本中包含“猫”“狗”及“爱情”等若 干单词,而这若干单词映射到向量空间中,“猫”对应的向量为(0.1,0.2,0.3),“狗”对应的向量为(0.2,0.2,0.4),“爱情”对应的映射为(-0.4,-0.5,-0.2)(本数据仅为示意)。像这种将文本X{x1,x2,x3,x4,x5……xn}映射到多维向量空间Y{y1,y2,y3,y4,y5……yn},这个映射过程就叫做词嵌入。之所以希望把每个单词都变成一个向量,目的还是为了方便计算,比如“猫”,“狗”,“爱情”三个词。对于我们人而言,可以知道“猫”和“狗”表示的都是动物,而“爱情”是表示的一种情感,但是对于机器而言,这三个词都是用0和1表示成二进制的字符串而已,无法对其进行计算。而通过词嵌入这种方式将单词转变为词向量,机器便可对单词进行计算,通过计算不同词向量之间夹角余弦值cos而得出单词之间的相似性,比如,在图3中,由于cosα<cosβ,可知“猫”与“狗”更相似,猫与“爱情”差异较大。其中,两个向量之间的夹角可以通过如下方式计算:向量夹角的公式是cosθ=向量a·向量b/|向量a|×|向量b|,其中,注意是点乘,比如,在Python中可以使用Python.numpy来计算向量的夹角。
具体地,将文本语料通过分词得到中文分词后进而转化为预训练的词向量,也就是将输入的自然语言经过分词后编码成词向量,为预训练词向量准备。具体实施时,可以使用预训练好的词向量,也可以直接在训练TextCNN的过程中训练出一套词向量,不过使用预训练好的词向量比在训练TextCNN的过程中训练出一套词向量快100倍不止。如果使用预训练好的词向量,又分为Static方法和No-static方法,Static方法是指在训练TextCNN过程中不再调节词向量的参数,No-static方法在训练过程中调节词向量的参数,所以No-static方法的结果比Static方法的结果要好。
进一步地,按照预设数量批次的所述中文分词对嵌入层进行调节,其中,批次,英文为Batch。也就是还可以不在每一个Batch(批)中都调节Embedding层(嵌入层),而是每个100个Batch的中文分词调节一次嵌入层,这样可以减少训练的时间,又可以微调词向量。
更进一步地,使用预设词向量字典将所述中文分词进行词嵌入以将所述中文分词转化为词向量,也就是使用训练好的预设词向量字典将所述中文分词进 行词嵌入以将所述中文分词转化为词向量,相比将所述中文分词直接进行词嵌入,可以提高将所述中文分词转化为词向量的效率。也就是可以使用训练好的预设词向量字典将所述中文词向量对应的语料进行词嵌入以将所述语料转化为词向量。在一个实施例中,词向量可以采用Word2Vec预训练词向量,即每个词汇都有对应的向量表示,此类向量表示能够以数据形式表达词汇信息,词向量维度可以为300。其中,Word2vec,英文为Word to vector,是一款用于训练词向量的软件工具,用来产生词向量的相关模型,词向量的自动训练可以通过Python中的Gensim库实现。
S230、使用卷积神经网络结合注意力函数对所述词向量进行特征提取以得到所述词向量的词向量特征。
其中,卷积神经网络,英文为Convolutional Neural Networks,简称为CNN,是一类包含卷积或者相关计算且具有深度结构的前馈神经网络(Feedforward Neural Networks),是深度学***移不变分类(英文为Shift-Invariant Classification),因此也被称为“平移不变人工神经网络(英文为Shift-Invariant Artificial Neural Networks,简称为SIANN)。
注意力,又称为注意力结构或者注意力机制,英文为Attention Mechanism,卷积神经网络中的注意力主要用于决定卷积神经网络需要关注输入的哪部分并分配有限的信息处理资源给重要的部分以聚焦卷积神经网络处理输入数据的针对性,提高卷积神经网络对输入数据的处理效率。
具体地,终端在TextCNN模型中加入了注意力结构,通过改善TextCNN模型对数据的处理针对性,提升TextCNN模型的训练效率和文本分类效率,从而实现使用卷积神经网络结合注意力函数对所述词向量进行特征提取以得到所述词向量的词向量特征。比如,对于TextCNN模型加入两种注意力机制,一类为Word-wise注意力结构,也就是词注意力结构,Word-wise注意力结构作用于词嵌入层,根据词向量输入,训练得到相应注意力值,针对词嵌入层输出的词向量添加注意力以提高卷积神经网络对词向量处理的针对性,一类为Filter-wise注意力结构,也就是通道注意力结构,Filter-wise注意力结构则作用于卷积层 之后,根据卷积通道输出,训练得到相应注意力值,以使池化后的输出数据在进行全连接时,提高全连接过程中对全连接对象的针对性,通过提高文本分类过程中的针对性从而提高整个文本分类的效率。
S240、通过全连接的方式连接所述词向量特征以得到输出数据。
其中,全连接是指连接所有的特征,将输出数据送给分类器,比如Softmax分类器,在TextCNN模型中,全连接是指输出层的神经元和输入层的每个神经元都连接,也就是卷积神经网络中的全连接层的每一个结点都与上一层的所有结点相连,用来把前边提取到的特征综合起来。全连接层,英文为Fully connected layer,是TextCNN模型中的一个层。
具体地,在TextCNN结构中,经过全连接层之前的逐层处理,比如经过多个卷积层和池化层后,一般会连接着1个或1个以上的全连接层,全连接层中的每个神经元与其前一层的所有神经元进行全连接,全连接层可以整合卷积层或者池化层中具有类别区分性的局部信息,进而全连接层将前面经过多次卷积后或者池化后的高度抽象化的特征进行整合,然后进行归一化以对各种特征的分类情况输出一个概率,发送至全连接层之后的分类器,以使分类器(Classifier)可以根据全连接得到的概率进行分类。并且,实际中为了提高网络的学习能力,可以拼接多个全连接层。
S250、经分类器对所述输出数据进行分类以得到文本分类结果。
具体地,终端上的TextCNN模型中的分类器接收全连接层发送的经全连接之后的输出数据,将所述输出数据经分类器进行分类以最终得到对文本的分类结果,其中,分类器可以采用softmax分类器。
本申请实施例在实现文本分类时,通过获取进行文本分类的语料,并将所述语料通过预设方式进行分词以得到中文分词,将所述中文分词进行词嵌入以将所述中文分词转化为词向量,使用卷积神经网络结合注意力函数对所述词向量进行特征提取以得到所述词向量的词向量特征,通过全连接的方式连接所述词向量特征以得到输出数据;经分类器对所述输出数据进行分类以得到文本分类结果,从而在基于卷积神经网络的文本分类模型中加入了注意力,以聚焦文本处理的针对性,能够有效提升文本分类模型的训练效率和文本分类效率。
请参阅图4和图5,图4为本申请实施例提供的文本分类方法的另一个流程示意图,图5为图4中的文本分类方法对应的模型示意图。如图4和图5所示,该方法包括以下步骤S410-S490:
S410、获取进行文本分类的语料,并将所述语料通过预设方式进行分词以得到中文分词;
S420、将所述中文分词进行词嵌入以将所述中文分词转化为词向量。
具体地,步骤S410和图2中的步骤S210相同,步骤S420和图2中的步骤S220相同,将图2中的步骤S210和S420均通过应用的方式包含于此,在此不再赘述。其中,图5中的输入层就是用于获取进行分类的文本语料,比如对新闻标题分类,图5所示的文本分类模型输入为文本词向量信息,输出为文本分类结果。图5中的词嵌入层就是用于将输入的自然语言语料进行词嵌入以将所述自然语言的语料编码成词向量。
S430、使用第一注意力函数对所述词向量分配注意力权重以得到调整后的词向量。
其中,注意力,又称为注意力机制,或者注意力模型,或者注意力结构,英文为Attention Model。自然语言处理中的注意力模型,借鉴了人类的注意力概念,一般来说,视觉注意力是人类视觉所特有的大脑信号处理机制,人类视觉通过快速扫描全局图像,获得需要重点关注的目标区域,也就是一般所说的注意力焦点,而后对这一区域投入更多注意力资源,以获取更多所需要关注目标的细节信息,而抑制其他无用信息,人类视觉注意力极大地提高了视觉信息处理的效率与准确性,本申请实施例中的注意力从本质上讲和人类的选择性视觉注意力类似,核心目标也是从众多信息中选择出对当前任务目标更关键的信息。
进一步地,注意力模型可以表现为一种函数,比如y=f(x),y=f(x)可以为线性关系,比如,y=wx+b,其中,y表示输出,x表示输入,w和b分别表示x和y线性关系的参数,w和b可以分别在训练过程中得到调整。
具体地,在卷积神经网络基础上添加第一注意力函数,使用第一注意力函数对所述词向量分配注意力权重以得到调整后的词向量,所述第一注意力函数 主要是用于对词向量的权重进行分配,对各词向量分配注意力权重,以重点突出需要关注的词向量。由于第一注意力函数是用于对词向量的权重进行分配,第一注意力函数又可以称为词注意力,英文为Word-wise,又可以称为Word-wise注意力或者Word-wise机制,Word-wise注意力用于提炼词注意力信息,即根据输入词汇分配注意力权重,并进行自动学习的过程,在自动学习中确定词的注意力权重,并输出至下一步运算。
Word-wise注意力所在的层称为Word-wise注意力结构层,或者称为Word-wise注意力层,或者Word-wise注意力结构层,Word-wise注意力层用于根据输入词汇分配注意力权重,并输出分配权重后的词向量至下一步运算。
进一步地,请参阅图5,如图5所示,Word-wise注意力结构层添加于词嵌入层与卷积层之间,Word-wise注意力结构层的输入来自于词嵌入层,Word-wise注意力结构层可以建立两层全连接层并以Softmax函数输出注意力权重,该输出用于调整词嵌入层的输出,并将注意力权重调整后的词嵌入层输出的数据输入至卷积层。比如,针对“我喜欢吃苹果”这样的描述,针对“我”、“喜欢”、“苹果”通过神经网络的自学习分配不同的权重,权重如何调整,也是神经网络在训练的过程中进行自适应和自学习从而实现动态调整。卷积神经网络以及注意力结构的建立均通过Python中的Tensorflow库实现。
在一实施例中,可将Word-wise注意力结构层添加于首层卷积层前方,以将Word-wise结构层的提炼信息输入至卷积层。比如,Word-wise注意力根据词嵌入层的词向量输入,通过建立一层或者两层的全连接隐层,并通过Softmax函数输出。进一步地,词注意力结构层的输出为注意力权重,需要将词嵌入层输出的词向量与注意力结构层的输出进行点乘,以完成所述词向量的注意力权重调整,并将调整后的词向量输出输入至卷积层以完成后续运算。其中,Softmax函数,或称归一化指数函数,是逻辑函数的一种推广,它能将一个含任意实数的K维向量z“压缩”到另一个K维实向量σ(z)中,使得每一个元素的范围都在(0,1)之间,并且所有元素的和为1,Softmax函数实际上是有限项离散概率分布的梯度对数归一化。点乘,也叫向量的内积、数量积,点乘的结果是一个数,比如,向量a·向量b=|a||b|cos<a,b>,cos<a,b>表示向量a和向量b夹角的 余弦值,将向量用坐标表示(三维向量),若向量a=(a1,b1,c1),向量b=(a2,b2,c2),则向量a·向量b=a1a2+b1b2+c1c2。
S440、使用卷积神经网络对所述调整后的词向量进行特征提取以得到词向量特征。
其中,卷积层,英文为Convolutional layer,主要是用一个采样器从输入数据中采集关键数据内容,卷积层最大的特点是局部感知和权重共享,从而实现通过卷积提取文本的不同特征,一般来说,卷积层、池化层及全连接层都属于卷积神经网络,并且是三种不同类型的隐藏层。
具体地,首先建立词嵌入层实现训练文本向词向量的转化,随后建立卷积神经网络以形成卷积层,通过卷积层进行文本特征提取,其中,卷积层可以包括多层卷积核。比如,卷积层含有高度为1、3、5的卷积核各128通道,也就是高度为1行、3行、5行的卷积核各128通道。在本实施例中,卷积层的输出将输入至后续激活层与池化层。
请继续参阅图5,卷积层主要是通过卷积以提取不同的N-gram特征。输入的语句或者文本,通过词嵌入层后,会转变成一个二维矩阵,假设文本的长度为|T|,词向量的大小为|d|,则该二维矩阵的大小为|T|*|d|,卷积的工作就是对这一个|T|*|d|的二维矩阵进行的。卷积核的大小一般设定为n*|d|,n是卷积核的长度,|d|是卷积核的宽度,这个宽度和词向量的维度是相同的,也就是卷积只是沿着文本序列进行的,n可以有多种选择,比如2、3、4、5等。对于一个|T|*|d|的文本,如果选择卷积核kernel的大小为2*|d|,则卷积后得到的结果是|T-2+1|*1的一个向量。在TextCNN网络中,需要同时使用多个不同类型的kernel,同时每个size的kernel又可以有多个。如果我们使用的kernel size大小为2、3、4、5*|d|,每个种类的size又有128个kernel,则卷积网络一共有4*128个卷积核。其中N-Gram是一种基于统计语言模型的算法,它的基本思想是将文本里面的内容按照字节进行大小为N的滑动窗口操作,形成长度是N的字节片段序列,每一个字节片段称为gram,对所有gram的出现频度进行统计,并且按照事先设定好的阈值进行过滤,形成关键gram列表,也就是这个文本的向量特征空间,列表中的每一种gram就是一个特征向量维度。
更进一步地,在训练卷积神经网络的过程中,卷积神经网络的损失函数为交叉熵,训练方法为ADAM,学习率为0.001,其中,ADAM,英文为Adaptive Moment Estimation,是自适应矩估计。同时,在训练神经网络时,需要设置学习率控制参数更新的速度,其中,学习率,英文为Learing rate,又称为学习速率,用于控制模型的学习进度。神经网络的训练通过Python中的Tensorflow库实现。
S450、使用激活函数对所述词向量特征进行激活。
其中,激活函数是用来加入非线性因素的。其中,常用的激活函数包括:Sigmoid函数、Tanh函数及ReLU函数等。目前大部分的卷积神经网络中,基本上都是采用了ReLU函数。使用激活函数来加入非线性因素的层在TextCNN模型中称为激活层。
具体地,由于线性模型的表达力不够,在神经网络中,对于图像,主要采用了卷积的方式来处理,也就是对每个像素点赋予一个权值,这个操作是线性的。但是对于使用的训练样本来说,不一定是线性可分的,为了解决这个问题,可以进行线性变化或者引入非线性因素,解决线性模型所不能解决的问题,因此,TextCNN模型中使用激活函数对所述词向量特征进行激活,也就是使用激活函数对所述调整后的词向量的词向量特征进行调整以在卷积层提取的词向量特征中加入非线性因素,从而提高词向量处理的准确性。
S460、对激活后的词向量特征进行池化。
其中,池化,英文为Pooling,是指使用卷积神经网络对输入的特征进行压缩以提取主要特征并将数据量变小,池化操作通常被用在卷积神经网络中。用于进行池化的层在在TextCNN模型中称为池化层,英文为Pooling layer。池化层用于降低卷积层输出的特征向量,同时改善结果以有效控制过拟合。
具体地,在卷积神经网络中,池化层往往在卷积层后面,通过池化来降低卷积层输出的特征向量,同时改善结果以有效控制过拟合。最常见的池化操作为平均池化Mean Pooling和最大池化Max Pooling。池化层的形式可以为Max Pooling,也就是最大池化层,最大池化即取局部接收域中值最大的点,Max Pooling能减小卷积层参数误差造成估计均值的偏移误差,更多的保留纹理信息。 一个最大池化层从一块特征中选取最大值。和卷积层一样,池化层也是通过窗口(块)大小和步幅尺寸进行参数化,比如,在一个10×10特征矩阵上以2的步幅滑动一个2×2的窗口,然后选取每个窗口的4个值中的最大值,得到一个5×5特征矩阵。池化层通过只保留最突出的信息来减少表征的维度。池化层的输出将接入全连接层实现全局特征提取,并以Softmax函数完成最终输出。
S470、使用第二注意力函数对所述词向量特征分配注意力权重以得到第一词向量特征。
其中,第二注意力函数用于调整各通道的池化层输出,对各通道分配注意力权重,并将权重调整后的池化层输出的数据输入全连接层进行后续计算。由于第二注意力函数用于调整各通道的池化层输出,因此也称为通道注意力,英文为Filter-wise,也可以称为Filter-wise注意力,或者Filter-wise注意力结构层。使用通道注意力调整各通道的池化层输出的层在TextCNN模型中称为通道注意力层,也可以称为Filter-wise注意力层。Filter-wise注意力层用于卷积通道输出,训练得到相应注意力值,Filter-wise注意力层添加与卷积层之后,可以添加于池化层与全连接层之间,该部分的输出来自于池化层输出,并可以通过建立两层全连接层以Softmax函数输出注意力权重。卷积神经网络以及注意力的建立均可以通过Python中的Tensorflow库实现。
具体地,Filter-wise机制与Word-wise机制具有类似的结构,但是作用于模型的不同部分,对不同的对象分配权重,起不同的作用。Filter-wise机制提炼通道注意力信息。由于卷积神经网络含有多个通道,因此Filter-wise机制能够显著提升模型训练效率。具体实施中,可将Filter-wise机制添加于通道最终输出,通过全连接隐层以及Softmax函数计算注意力权重,并将计算结果与通道输出进行点乘计算,以得到权重调整后的通道输出,以实现此处使用第二注意力函数对池化后的词向量特征分配注意力权重以得到第一词向量特征,并进行后续计算。
S480、通过全连接的方式连接所述第一词向量特征以得到输出数据。
具体地,通过全连接的方式连接所述第一词向量特征以得到输出数据,在TextCNN模型中,也就是通过全连接层将前面经过多次卷积后高度抽象化的特征 进行整合,然后可以进行归一化,对各种分类情况都输出一个概率,之后的分类器可以根据全连接得到的概率进行分类,分类器可以为Classifi er分类器。比如,Fully-connectedlayer在Max-pooling layer后再拼接一层,将该层的输出作为输出结果。实际中为了提高网络的学习能力,可以拼接多个全连接层。
S490、经分类器对所述输出数据进行分类以得到文本分类结果。
具体地,步骤S490和图2中的步骤S250相同,将图2中的步骤S250通过应用的方式包含于此,在此不再赘述。经分类器对所述输出数据进行分类以得到文本分类结果,文本分类结果经过输出层输出,输出层用于输出文本分类结果。
本申请实施例中对于TextCNN网络结构加入了两种注意力,一类为Word-wise注意力,一类为Filter-wise注意力。其中,Word-wise注意力作用于词嵌入层输出的词向量,根据词嵌入层的词向量输入,训练得到词向量相应的注意力值。而Filter-wise注意力则作用于卷积层的输出,根据卷积通道的输出数据,训练得到输出数据的相应注意力值,Word-wise注意力和Filter-wise注意力与TextCNN模型的其他部分共同进行训练,而不需要进行额外训练计算。实际操作中,以TextCNN分类结果交叉熵为损失函数,以ADAM为优化方法,进行训练。本申请实施例通过在TextCNN模型中加入了针对词的词注意力与针对通道的通道注意力,能够有效提升TextCNN模型的训练效率。本申请实施例的TextCNN模型应用于文本分类,如新闻标题分类,TextCNN模型输入为文本词向量信息,输出为文本分类结果,实践中,Word-wise与Filter-wise机制的加入能够提升TextCNN模型训练效率,尤其是Filter-wise机制具有显著效果。
在一个实施例中,所述使用卷积神经网络结合注意力函数对所述词向量进行特征提取以得到所述词向量的词向量特征的步骤包括:
使用卷积神经网络对所述词向量进行特征提取以得到所述词向量的第二词向量特征;
使用第二注意力函数对所述第二词向量特征分配注意力权重以得到所述词向量的词向量特征。
具体地,请参阅图6,图6为该实施例提供的文本分类方法对应的模型示意图。如图6所示,在进行文本分类的过程中,通过输入层获取进行文本分类的语料,并将所述语料通过预设方式进行分词以得到中文分词,将所述中文分词通过词嵌入层进行词嵌入以将所述中文分词转化为词向量,通过卷积层使用卷积神经网络对所述词向量进行特征提取以得到所述词向量的第二词向量特征,通过Filter-wise注意力结构层使用第二注意力函数对所述第二词向量特征分配注意力权重以得到所述词向量的词向量特征,然后通过全连接层使用全连接的方式连接所述词向量特征以得到输出数据,最后通过输出层经分类器对所述输出数据进行分类以得到文本分类结果。其中,第二注意力函数是用于调整各通道的池化层输出,因此也称为通道注意力,英文为Filter-wise,也可以称为Filter-wise注意力。使用通道注意力调整各通道的池化层输出的层在TextCNN模型中称为通道注意力层,也可以称为Filter-wise注意力层,或者Filter-wise注意力结构层。
在一个实施例中,所述使用第二注意力函数对所述第二词向量特征分配注意力权重以得到所述词向量的词向量特征的步骤之前,还包括:
使用激活函数对所述第二词向量特征进行激活;
对激活后的所述第二词向量特征进行池化。
具体地,请参阅图7,图7为本实施例提供的文本分类方法对应的模型示意图。如图7所示,在进行文本分类的过程中,通过输入层获取进行文本分类的语料,并将所述语料通过预设方式进行分词以得到中文分词,将所述中文分词通过词嵌入层进行词嵌入以将所述中文分词转化为词向量,通过卷积层使用卷积神经网络对所述词向量进行特征提取以得到所述词向量的第二词向量特征,通过激活层使用激活函数对卷积层输出的所述第二词向量特征加入非线性进行调整,从而提高词向量处理的准确性,然后通过池化层降低激活层输出的经过非线性调整的第二词特征向量,再通过Filter-wise注意力结构层使用第二注意力函数对所述第二词向量特征分配注意力权重以得到所述词向量的词向量特征,然后通过全连接层使用全连接的方式连接所述词向量特征以得到输出数据,最后通过输出层经分类器对所述输出数据进行分类以得到文本分类结果。其中, 第二注意力函数是用于调整各通道的池化层输出,因此也称为通道注意力,英文为Filter-wise,也可以称为Filter-wise注意力。使用通道注意力调整各通道的池化层输出的层在TextCNN模型中称为通道注意力层,也可以称为Filter-wise注意力层,或者Filter-wise注意力结构层。
需要说明的是,上述各个实施例所述的文本分类方法,可以根据需要将不同实施例中包含的技术特征重新进行组合,以获取组合后的实施方案,但都在本申请要求的保护范围之内。
请参阅图8,图8为本申请实施例提供的文本分类装置的示意性框图。对应于上述文本分类方法,本申请实施例还提供一种文本分类装置。如图8所示,该文本分类装置包括用于执行上述文本分类方法的单元,该装置可以被配置于终端或者服务器等计算机设备中。具体地,请参阅图8,该文本分类装置800包括获取单元801、转化单元802、提取单元803、连接单元804及分类单元805。
其中,获取单元801,用于获取进行文本分类的语料,并将所述语料通过预设方式进行分词以得到中文分词;
转化单元802,用于将所述中文分词进行词嵌入以将所述中文分词转化为词向量;
提取单元803,用于使用卷积神经网络结合注意力函数对所述词向量进行特征提取以得到所述词向量的词向量特征;
连接单元804,用于通过全连接的方式连接所述词向量特征以得到输出数据;
分类单元805,用于经分类器对所述输出数据进行分类以得到文本分类结果。
请参阅图9,图9为本申请实施例提供的文本分类装置的另一个示意性框图。如图9所示,在该实施例中,所述提取单元803包括:
第一分配子单元8031,用于使用第一注意力函数对所述词向量分配注意力权重以得到调整后的词向量;
提取子单元8032,用于使用卷积神经网络对所述调整后的词向量进行特征提取以得到词向量特征;
激活子单元8033,用于使用激活函数对所述词向量特征进行激活;
池化子单元8034,用于对激活后的词向量特征进行池化。
第二分配子单元8035,用于使用第二注意力函数对所述词向量特征分配注意力权重以得到第一词向量特征;
所述连接单元804,用于通过全连接的方式连接所述第一词向量特征以得到输出数据。
请继续参阅图9,如图9所示,在另一个实施例中,所述提取单元803包括:
提取子单元8032,用于使用卷积神经网络对所述词向量进行特征提取以得到所述词向量的第二词向量特征;
激活子单元8033,用于使用激活函数对所述第二词向量特征进行激活;
池化子单元8034,用于对激活后的所述第二词向量特征进行池化。
第二分配子单元8035,用于使用第二注意力函数对所述第二词向量特征分配注意力权重以得到所述词向量的词向量特征。
需要说明的是,所属领域的技术人员可以清楚地了解到,上述文本分类装置和各单元的具体实现过程,可以参考前述方法实施例中的相应描述,为了描述的方便和简洁,在此不再赘述。
同时,上述文本分类装置中各个单元的划分和连接方式仅用于举例说明,在其他实施例中,可将文本分类装置按照需要划分为不同的单元,也可将文本分类装置中各单元采取不同的连接顺序和方式,以完成上述文本分类装置的全部或部分功能。
上述文本分类装置可以实现为一种计算机程序的形式,该计算机程序可以在如图10所示的计算机设备上运行。
请参阅图10,图10是本申请实施例提供的一种计算机设备的示意性框图。该计算机设备1000可以是台式机电脑或者服务器等计算机设备,也可以是其他设备中的组件或者部件。
参阅图10,该计算机设备1000包括通过***总线1001连接的处理器1002、存储器和网络接口1005,其中,存储器可以包括非易失性存储介质1003和内存储器1004。
该非易失性存储介质1003可存储操作***10031和计算机程序10032。该计算机程序10032被执行时,可使得处理器1002执行一种上述文本分类方法。
该处理器1002用于提供计算和控制能力,以支撑整个计算机设备1000的运行。
该内存储器1004为非易失性存储介质1003中的计算机程序10032的运行提供环境,该计算机程序10032被处理器1002执行时,可使得处理器1002执行一种上述文本分类方法。
该网络接口1005用于与其它设备进行网络通信。本领域技术人员可以理解,图10中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备1000的限定,具体的计算机设备1000可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。例如,在一些实施例中,计算机设备可以仅包括存储器及处理器,在这样的实施例中,存储器及处理器的结构及功能与图10所示实施例一致,在此不再赘述。
其中,所述处理器1002用于运行存储在存储器中的计算机程序10032,以实现本申请实施例的文本分类方法。
应当理解,在本申请实施例中,处理器1002可以是中央处理单元(Central Processing Unit,CPU),该处理器1002还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。其中,通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
本领域普通技术人员可以理解的是实现上述实施例的方法中的全部或部分流程,是可以通过计算机程序来完成,该计算机程序可存储于一计算机可读存储介质。该计算机程序被该计算机***中的至少一个处理器执行,以实现上述文本分类方法的实施例的步骤。
因此,本申请实施例还提供一种计算机可读存储介质。该计算机可读存储介质存储有计算机程序,该计算机程序被处理器执行时使处理器执行以上各实施例中所描述的文本分类方法的步骤。
所述计算机可读存储介质可以是U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、磁碟或者光盘等各种可以存储计算机程序的计算机可读存储介质。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
以上所述,仅为本申请的具体实施方式,但本申请明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。

Claims (20)

  1. 一种文本分类方法,包括:
    获取进行文本分类的语料,并将所述语料通过预设方式进行分词以得到中文分词;
    将所述中文分词进行词嵌入以将所述中文分词转化为词向量;
    使用卷积神经网络结合注意力函数对所述词向量进行特征提取以得到所述词向量的词向量特征;
    通过全连接的方式连接所述词向量特征以得到输出数据;
    经分类器对所述输出数据进行分类以得到文本分类结果。
  2. 根据权利要求1所述文本分类方法,其中,所述使用卷积神经网络结合注意力函数对所述词向量进行特征提取以得到所述词向量的词向量特征的步骤包括:
    使用第一注意力函数对所述词向量分配注意力权重以得到调整后的词向量;
    使用卷积神经网络对所述调整后的词向量进行特征提取以得到词向量特征。
  3. 根据权利要求2所述文本分类方法,其中,所述使用卷积神经网络对所述调整后的词向量进行特征提取以得到词向量特征的步骤之后,还包括:
    使用第二注意力函数对所述词向量特征分配注意力权重以得到第一词向量特征;
    所述通过全连接的方式连接所述词向量特征以得到输出数据的步骤包括:
    通过全连接的方式连接所述第一词向量特征以得到输出数据。
  4. 根据权利要求3所述文本分类方法,其中,所述使用第二注意力函数对所述词向量特征分配注意力权重以得到第一词向量特征的步骤之前,还包括:
    使用激活函数对所述词向量特征进行激活。
  5. 根据权利要求4所述文本分类方法,其特征在于,所述使用激活函数对所述词向量特征进行激活的步骤之后,还包括:
    对激活后的词向量特征进行池化。
  6. 根据权利要求1所述文本分类方法,其中,所述使用卷积神经网络结合注意力函数对所述词向量进行特征提取以得到所述词向量的词向量特征的步骤 包括:
    使用卷积神经网络对所述词向量进行特征提取以得到所述词向量的第二词向量特征;
    使用第二注意力函数对所述第二词向量特征分配注意力权重以得到所述词向量的词向量特征。
  7. 根据权利要求6所述文本分类方法,其中,所述使用第二注意力函数对所述第二词向量特征分配注意力权重以得到所述词向量的词向量特征的步骤之前,还包括:
    使用激活函数对所述第二词向量特征进行激活;
    对激活后的所述第二词向量特征进行池化。
  8. 根据权利要求1所述文本分类方法,其中,所述将所述中文分词进行词嵌入以将所述中文分词转化为词向量的步骤包括:
    使用预设词向量字典将所述中文分词进行词嵌入以将所述中文分词转化为词向量。
  9. 一种文本分类装置,包括:
    获取单元,用于获取进行文本分类的语料,并将所述语料通过预设方式进行分词以得到中文分词;
    转化单元,用于将所述中文分词进行词嵌入以将所述中文分词转化为词向量;
    提取单元,用于使用卷积神经网络结合注意力函数对所述词向量进行特征提取以得到所述词向量的词向量特征;
    连接单元,用于通过全连接的方式连接所述词向量特征以得到输出数据;
    分类单元,用于经分类器对所述输出数据进行分类以得到文本分类结果。
  10. 根据权利要求9所述文本分类装置,其中,所述提取单元包括:
    第一分配子单元,用于使用第一注意力函数对所述词向量分配注意力权重以得到调整后的词向量;
    提取子单元,用于使用卷积神经网络对所述调整后的词向量进行特征提取以得到词向量特征。
  11. 一种计算机设备,其中,所述计算机设备包括存储器以及与所述存储器相连的处理器;所述存储器用于存储计算机程序;所述处理器用于运行所述存储器中存储的计算机程序,以执行如下步骤。
    获取进行文本分类的语料,并将所述语料通过预设方式进行分词以得到中文分词;
    将所述中文分词进行词嵌入以将所述中文分词转化为词向量;
    使用卷积神经网络结合注意力函数对所述词向量进行特征提取以得到所述词向量的词向量特征;
    通过全连接的方式连接所述词向量特征以得到输出数据;
    经分类器对所述输出数据进行分类以得到文本分类结果。
  12. 根据权利要求11所述计算机设备,其中,所述使用卷积神经网络结合注意力函数对所述词向量进行特征提取以得到所述词向量的词向量特征的步骤包括:
    使用第一注意力函数对所述词向量分配注意力权重以得到调整后的词向量;
    使用卷积神经网络对所述调整后的词向量进行特征提取以得到词向量特征。
  13. 根据权利要求12所述计算机设备,其中,所述使用卷积神经网络对所述调整后的词向量进行特征提取以得到词向量特征的步骤之后,还包括:
    使用第二注意力函数对所述词向量特征分配注意力权重以得到第一词向量特征;
    所述通过全连接的方式连接所述词向量特征以得到输出数据的步骤包括:
    通过全连接的方式连接所述第一词向量特征以得到输出数据。
  14. 根据权利要求13所述计算机设备,其中,所述使用第二注意力函数对所述词向量特征分配注意力权重以得到第一词向量特征的步骤之前,还包括:
    使用激活函数对所述词向量特征进行激活。
  15. 根据权利要求14所述计算机设备,其中,所述使用激活函数对所述词向量特征进行激活的步骤之后,还包括:
    对激活后的词向量特征进行池化。
  16. 根据权利要求11所述计算机设备,其中,所述使用卷积神经网络结合 注意力函数对所述词向量进行特征提取以得到所述词向量的词向量特征的步骤包括:
    使用卷积神经网络对所述词向量进行特征提取以得到所述词向量的第二词向量特征;
    使用第二注意力函数对所述第二词向量特征分配注意力权重以得到所述词向量的词向量特征。
  17. 根据权利要求16所述计算机设备,其中,所述使用第二注意力函数对所述第二词向量特征分配注意力权重以得到所述词向量的词向量特征的步骤之前,还包括:
    使用激活函数对所述第二词向量特征进行激活;
    对激活后的所述第二词向量特征进行池化。
  18. 根据权利要求11所述计算机设备,其中,所述将所述中文分词进行词嵌入以将所述中文分词转化为词向量的步骤包括:
    使用预设词向量字典将所述中文分词进行词嵌入以将所述中文分词转化为词向量。
  19. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时使所述处理器执行如下步骤:
    获取进行文本分类的语料,并将所述语料通过预设方式进行分词以得到中文分词;
    将所述中文分词进行词嵌入以将所述中文分词转化为词向量;
    使用卷积神经网络结合注意力函数对所述词向量进行特征提取以得到所述词向量的词向量特征;
    通过全连接的方式连接所述词向量特征以得到输出数据;
    经分类器对所述输出数据进行分类以得到文本分类结果。
  20. 根据权利要求19所述计算机可读存储介质,其中,所述使用卷积神经网络结合注意力函数对所述词向量进行特征提取以得到所述词向量的词向量特征的步骤包括:
    使用第一注意力函数对所述词向量分配注意力权重以得到调整后的词向量;
    使用卷积神经网络对所述调整后的词向量进行特征提取以得到词向量特征。
PCT/CN2019/092531 2019-01-04 2019-06-24 文本分类方法、装置、计算机设备及存储介质 WO2020140403A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910007705.9 2019-01-04
CN201910007705.9A CN109857860A (zh) 2019-01-04 2019-01-04 文本分类方法、装置、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2020140403A1 true WO2020140403A1 (zh) 2020-07-09

Family

ID=66893898

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/092531 WO2020140403A1 (zh) 2019-01-04 2019-06-24 文本分类方法、装置、计算机设备及存储介质

Country Status (2)

Country Link
CN (1) CN109857860A (zh)
WO (1) WO2020140403A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112820412A (zh) * 2021-02-03 2021-05-18 东软集团股份有限公司 用户信息的处理方法、装置、存储介质和电子设备
WO2021105887A1 (en) 2019-11-25 2021-06-03 Emp Biotech Gmbh Separation and isolation of nucleic acids using affinity ligands bound to a solid surface

Families Citing this family (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109857860A (zh) * 2019-01-04 2019-06-07 平安科技(深圳)有限公司 文本分类方法、装置、计算机设备及存储介质
CN110263139A (zh) * 2019-06-10 2019-09-20 湖北亿咖通科技有限公司 车辆、车机设备及其基于神经网络的文本意图识别方法
CN110362734A (zh) * 2019-06-24 2019-10-22 北京百度网讯科技有限公司 文本识别方法、装置、设备及计算机可读存储介质
CN110427610A (zh) * 2019-06-25 2019-11-08 平安科技(深圳)有限公司 文本分析方法、装置、计算机装置及计算机存储介质
CN110442689A (zh) * 2019-06-25 2019-11-12 平安科技(深圳)有限公司 一种问答关系排序方法、装置、计算机设备及存储介质
CN110427456A (zh) * 2019-06-26 2019-11-08 平安科技(深圳)有限公司 一种词语联想的方法及装置
CN110427480B (zh) * 2019-06-28 2022-10-11 平安科技(深圳)有限公司 个性化文本智能推荐方法、装置及计算机可读存储介质
CN110543629A (zh) * 2019-08-01 2019-12-06 淮阴工学院 一种基于w-att-cnn算法的化工装备文本分类方法
CN110442823A (zh) * 2019-08-06 2019-11-12 北京智游网安科技有限公司 网站分类方法、网站类型判断方法、存储介质及智能终端
CN110609897B (zh) * 2019-08-12 2023-08-04 北京化工大学 一种融合全局和局部特征的多类别中文文本分类方法
CN110442683A (zh) * 2019-08-13 2019-11-12 北京明略软件***有限公司 文本信息的处理方法及装置、存储介质、电子装置
CN110705290B (zh) * 2019-09-29 2023-06-23 新华三信息安全技术有限公司 一种网页分类方法及装置
CN110737811B (zh) * 2019-10-25 2024-01-16 腾讯科技(深圳)有限公司 应用分类方法、装置以及相关设备
CN110889717A (zh) * 2019-11-14 2020-03-17 腾讯科技(深圳)有限公司 文本中的广告内容过滤方法、装置、电子设备及存储介质
CN111046175B (zh) * 2019-11-18 2023-05-23 杭州天翼智慧城市科技有限公司 基于自学习的电子案卷分类方法及装置
CN111061873B (zh) * 2019-11-28 2022-03-15 北京工业大学 一种基于Attention机制的多通道的文本分类方法
CN111027529A (zh) * 2019-12-04 2020-04-17 深圳市新国都金服技术有限公司 减少深度学习ocr的参数量和计算量的方法与计算机设备及存储介质
CN110674263B (zh) * 2019-12-04 2022-02-08 广联达科技股份有限公司 一种模型构件文件自动分类的方法和装置
CN111145914B (zh) * 2019-12-30 2023-08-04 四川大学华西医院 一种确定肺癌临床病种库文本实体的方法及装置
CN111145913B (zh) * 2019-12-30 2024-02-20 讯飞医疗科技股份有限公司 基于多重注意力模型的分类方法、装置及设备
CN111291189B (zh) * 2020-03-10 2020-12-04 北京芯盾时代科技有限公司 一种文本处理方法、设备及计算机可读存储介质
CN111476028A (zh) * 2020-04-02 2020-07-31 言图科技有限公司 一种汉语短语识别方法、***、存储介质及电子设备
CN112015895A (zh) * 2020-08-26 2020-12-01 广东电网有限责任公司 一种专利文本分类方法及装置
CN112131386A (zh) * 2020-09-22 2020-12-25 新华三大数据技术有限公司 一种文本分类方法及装置
CN112115266A (zh) * 2020-09-25 2020-12-22 奇安信科技集团股份有限公司 恶意网址的分类方法、装置、计算机设备和可读存储介质
CN112163064B (zh) * 2020-10-14 2024-04-16 上海应用技术大学 基于深度学习的文本分类方法
CN112507114A (zh) * 2020-11-04 2021-03-16 福州大学 一种基于词注意力机制的多输入lstm_cnn文本分类方法及***
CN112562669B (zh) * 2020-12-01 2024-01-12 浙江方正印务有限公司 一种智能数字报自动摘要与语音交互聊新闻方法及***
CN112668320B (zh) * 2020-12-25 2024-02-02 平安科技(深圳)有限公司 基于词嵌入的模型训练方法、装置、电子设备及存储介质
CN112732912B (zh) * 2020-12-30 2024-04-09 平安科技(深圳)有限公司 敏感倾向表述检测方法、装置、设备及存储介质
CN113177118A (zh) * 2021-04-29 2021-07-27 中国邮政储蓄银行股份有限公司 文本分类模型、文本分类的方法以及装置
CN113806480A (zh) * 2021-09-16 2021-12-17 浙江核新同花顺网络信息股份有限公司 一种文本分类方法、装置、计算机设备及存储介质
CN114579743B (zh) * 2022-03-04 2024-06-14 合众新能源汽车股份有限公司 基于注意力的文本分类方法、装置及计算机可读介质
CN118171648A (zh) * 2024-05-11 2024-06-11 中移(苏州)软件技术有限公司 文本提取方法、装置、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130318124A1 (en) * 2011-02-08 2013-11-28 Fujitsu Limited Computer product, retrieving apparatus, and retrieval method
CN107885853A (zh) * 2017-11-14 2018-04-06 同济大学 一种基于深度学习的组合式文本分类方法
CN108415977A (zh) * 2018-02-09 2018-08-17 华南理工大学 一个基于深度神经网络及强化学习的生成式机器阅读理解方法
CN108614875A (zh) * 2018-04-26 2018-10-02 北京邮电大学 基于全局平均池化卷积神经网络的中文情感倾向性分类方法
CN109857860A (zh) * 2019-01-04 2019-06-07 平安科技(深圳)有限公司 文本分类方法、装置、计算机设备及存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107220231A (zh) * 2016-03-22 2017-09-29 索尼公司 用于自然语言处理的电子设备和方法以及训练方法
US10846523B2 (en) * 2016-11-14 2020-11-24 Kodak Alaris Inc. System and method of character recognition using fully convolutional neural networks with attention
JP6738769B2 (ja) * 2017-04-27 2020-08-12 日本電信電話株式会社 文ペア分類装置、文ペア分類学習装置、方法、及びプログラム
CN109101948B (zh) * 2018-08-28 2021-06-04 电子科技大学 一种基于时空及通道的多注意力机制视频描述方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130318124A1 (en) * 2011-02-08 2013-11-28 Fujitsu Limited Computer product, retrieving apparatus, and retrieval method
CN107885853A (zh) * 2017-11-14 2018-04-06 同济大学 一种基于深度学习的组合式文本分类方法
CN108415977A (zh) * 2018-02-09 2018-08-17 华南理工大学 一个基于深度神经网络及强化学习的生成式机器阅读理解方法
CN108614875A (zh) * 2018-04-26 2018-10-02 北京邮电大学 基于全局平均池化卷积神经网络的中文情感倾向性分类方法
CN109857860A (zh) * 2019-01-04 2019-06-07 平安科技(深圳)有限公司 文本分类方法、装置、计算机设备及存储介质

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021105887A1 (en) 2019-11-25 2021-06-03 Emp Biotech Gmbh Separation and isolation of nucleic acids using affinity ligands bound to a solid surface
CN112820412A (zh) * 2021-02-03 2021-05-18 东软集团股份有限公司 用户信息的处理方法、装置、存储介质和电子设备
CN112820412B (zh) * 2021-02-03 2024-03-08 东软集团股份有限公司 用户信息的处理方法、装置、存储介质和电子设备

Also Published As

Publication number Publication date
CN109857860A (zh) 2019-06-07

Similar Documents

Publication Publication Date Title
WO2020140403A1 (zh) 文本分类方法、装置、计算机设备及存储介质
US11308405B2 (en) Human-computer dialogue method and apparatus
WO2020228376A1 (zh) 文本处理方法、模型训练方法和装置
EP4145308A1 (en) Search recommendation model training method, and search result sorting method and device
WO2021164772A1 (zh) 训练跨模态检索模型的方法、跨模态检索的方法和相关装置
WO2020083073A1 (zh) 非机动车图像多标签分类方法、***、设备及存储介质
WO2019052301A1 (zh) 视频分类的方法、信息处理的方法以及服务器
WO2020143320A1 (zh) 文本词向量获取方法、装置、计算机设备及存储介质
WO2020151175A1 (zh) 文本生成方法、装置、计算机设备及存储介质
WO2020215560A1 (zh) 自编码神经网络处理方法、装置、计算机设备及存储介质
WO2020147409A1 (zh) 一种文本分类方法、装置、计算机设备及存储介质
WO2019154411A1 (zh) 词向量更新方法和装置
WO2023065635A1 (zh) 命名实体识别方法、装置、存储介质及终端设备
EP3620982B1 (en) Sample processing method and device
CN112231584A (zh) 基于小样本迁移学习的数据推送方法、装置及计算机设备
CN110717009A (zh) 一种法律咨询报告的生成方法及设备
CN112149410A (zh) 语义识别方法、装置、计算机设备和存储介质
CN110969172A (zh) 一种文本的分类方法以及相关设备
CN111985228A (zh) 文本关键词提取方法、装置、计算机设备和存储介质
WO2023284716A1 (zh) 一种神经网络搜索方法及相关设备
WO2023231887A1 (zh) 基于张量的持续学习方法和装置
CN116775497B (zh) 数据库测试用例生成需求描述编码方法
US20230153527A1 (en) System and method for infusing knowledge graphs and language models for natural language sentence pair applications
CN117616431A (zh) 针对大规模数据的可解释的机器学习
WO2022052633A1 (zh) 文本备份方法、装置、设备及计算机可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19907039

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19907039

Country of ref document: EP

Kind code of ref document: A1