CN112527959B - News classification method based on pooling convolution embedding and attention distribution neural network - Google Patents
News classification method based on pooling convolution embedding and attention distribution neural network Download PDFInfo
- Publication number
- CN112527959B CN112527959B CN202011443363.4A CN202011443363A CN112527959B CN 112527959 B CN112527959 B CN 112527959B CN 202011443363 A CN202011443363 A CN 202011443363A CN 112527959 B CN112527959 B CN 112527959B
- Authority
- CN
- China
- Prior art keywords
- news
- text
- vector
- attention
- convolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3346—Query execution using probabilistic model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a news classification method based on pooling convolution embedded and attention distributed neural network, which uses characteristics and weights as key factors in the classification process. The mechanism is to use a convolution in the embedded layer to extract local features, delete the pooling layer to reduce information loss, and then add an attention mechanism to reassign weights to obtain global features of the text. The model captures not only the profound features of the text, but also the importance of the news parts. Convolutional Neural Networks (CNNs) play an important role in text classification tasks due to their advantages in extracting local features and location invariant features. The attention mechanism strengthens the weight of key information due to the fact that the attention mechanism extracts text context information and focuses on the characteristics of important parts, and the attention mechanism and the text context information are combined with stronger feature extraction capability. Combining the pooling-free CNN with the global attention mechanism to handle news classification problems can significantly improve the accuracy of text classification.
Description
Technical Field
The invention belongs to a Chinese news text classification method, and particularly relates to a news classification method based on pooling-free convolution embedding and attention-distributed neural network.
Background
Text classification is a classical task of NLP. It assigns its corresponding tag to the specified text. Currently, text classification methods are mainly classified into conventional machine-learning text classification and deep-learning text classification.
Traditional machine-learning text classification methods include K nearest neighbor (KNN, K-nearest neighbor), maximum entropy (The Maximum Entropy), support vector machine (SVM, support Vector Machines), and the like. The core idea of the KNN algorithm is that if k samples out of the k nearest neighbors in most feature space belong to a certain class, the samples also belong to this class and share features with the samples in this class. The different categories are determined by the number of nearest neighbors, so it is appropriate for the sample size in the training dataset. The principle of maximum entropy is that the model with the maximum entropy is the best model when learning the probabilistic model. That is, the maximum entropy can also be understood as a model of the maximum entropy selected among a set of models satisfying the constraint. SVM is a generalized linear classifier for binary classification of data by supervised learning. Deep learning algorithms are now beginning to be widely used for text classification. A Recurrent Neural Network (RNN) is a time series-based neural network model that can capture long-term dependencies between sequences. However, as the length of the sequence increases, it is difficult for a standard RNN to obtain long-term dependencies, and thus it is difficult to model the entire sequence. During modeling, some information may be lost and there are problems with gradient extinction and gradient explosion. Convolutional Neural Networks (CNNs) are also applied to text classification tasks, which have great advantages in capturing local features and location invariant features. The use of long short memory networks (LSTM) can simulate the relationships between sentences. The LSTM is added with three gate structures based on RNN, so that the problems of gradient disappearance and gradient explosion are solved. In contrast to LSTM, the Gated Recursive Unit (GRU) has only two gate structures, namely an update gate and a reset gate. Thus, the GRU has fewer parameters and better convergence during training. Also, the hierarchical attention model incorporates the attention mechanism into the hierarchical GRU model so that the model can better capture important information of the document. In recent years, attention mechanisms have been widely used in the field of text classification because it can distinguish the importance of each word to the classification result.
Since a computer cannot directly process a text sequence, it is important to express the text in a form that the computer can understand (called text vectorization).
The invention aims at solving the problems that the input text semantic information is insufficient, and the pooling layer can cause information loss and the classification precision is reduced.
Disclosure of Invention
The present invention is directed to solving the above problems of the prior art. A news classification method based on pooling convolution embedding and attention distribution neural network is provided. The technical scheme of the invention is as follows:
a method of news classification based on a pooling convolutional embedding and attention-distributing neural network, comprising the steps of:
step 1: collecting a news text data set, carrying out standardized format processing and word segmentation on the news text, utilizing word embedding to obtain feature vectors of news, carrying out random segmentation on news according to news categories and news data on news labels, dividing the corpus into a training set, a testing set and a verification set, wherein the training set is used for training a news classification model, the verification set is used for verifying whether the model is reasonable or not, and the testing set is used for testing the classification effect of the model;
step 2: inputting the feature vector obtained by word embedding of the training set in the corpus in the step 1 into a CNN convolutional neural network, and canceling a pooling layer in the CNN;
step 3: inputting the feature vectors subjected to word embedding and pooling-free convolution in the step 2 into an attention mechanism, and carrying out weight redistribution on the feature vectors in the text so as to train a news classification model;
step 4: inputting the text vector of the test set in the corpus in the step 1 into CNN, classifying news categories according to the trained model in the step 3, and calculating the accuracy of the news categories.
2. A method for classifying news based on a pooling convolutional embedded and attention-distributed neural network according to claim 1, wherein said step 1: the news data set is collected, and for Chinese news, the format of the data set is normalized, and the format is as follows: the 'tag +' \t '+news' form is that a word-segmented news text word is used as the input of a word-segmentation layer to obtain a feature vector x of a group of words 0 ,x 1 ,x 2 ,...,x t . The feature vector is a language which can be identified by a computer. For text category labels, the size letters of the input language are specified, and each character is encoded using a 1-m encoding; then, the sequence of the character sequence vector is converted into a fixed length l 0 Exceeding the length l 0 All characters of (2) will be ignored and less than l 0 Will be filled with 0 later.
3. The news classifying method based on pooling convolution embedding and attention distribution neural network according to claim 2, wherein said step 2 is to use word vector x of training set in corpus in step 1 0 ,x 1 ,x 2 ,...,x n Inputting CNN, canceling a pooling layer of a character convolution network, wherein the pooling layer comprises the following specific steps: inputting the word vector with distributed representation into one-dimensional convolution network, wherein the network comprises an input layer, a convolution layer, an output layer, a pooling layer of the convolution neural network for maximizing the preservation of text characteristics, and one-dimensional convolution calculationObtaining the sum of the convolution of the discrete function and the discrete kernel function:
where τ (x) is a discrete kernel function, the input discrete function is δ (x), d is a step size, b is a bias term, where x represents a word vector and n represents the number of news word vectors.
4. A method of news classification based on a pooling convolutional embedding and attention-distributing neural network according to claim 3, characterized in that said b = k-d +1 is an offset constant, defined by a set of kernel functions τ ij (x) Parameterizing i=1, 2, …, v, j=1, 2, …, w, each input δ i (x) Or output c j (y) are all called "features", m and n represent the size of the input and output features, output c j (y) is delta i (x) And τ ij (x) Is a convolution sum of (a) and (b).
5. The news classification method based on the pooling convolution embedding and attention distribution neural network according to claim 4, wherein the step 3 inputs the feature vectors after word embedding and pooling convolution in the step 2 into an attention mechanism, and performs weight redistribution on the feature vectors in the text so as to train a news classification model, and specifically comprises the following steps:
for the feature vectors obtained in step 2, an attention model is input for each word x 0 ,x 1 ,...,x n Are each represented in vector form and are input to a convolution unit to obtain an output h 0 ,h 1 ,…,h n This output serves as input source=h for the attention mechanism 0 ,h 1 ,…,h n A final feature vector of the text is calculated. In the attention mechanism, the hidden layer t moment state h t Is randomly initialized and updated as a parameter during training while giving the source-side context vector s t Source side context vector s t Calculated as a weighted sum of the individual inputs, calculated as follows:
wherein L represents news text length, a t (s) represents a variable length alignment vector,representing the hidden layer state of the encoder. />
Context vector s t All concealment states of the encoder should be considered, in the attention mechanism part, by concealing the state h at the decoder t moment t Hiding state with each source of encoderComparison to generate variable length pair Ji Xiangliang a t (s):
f a Is a function based on the content of the content,representing the decoder t moment hidden state h t Source hidden state with encoder->Function of->A content function representing the decoder t moment concealment state and all source concealment states of the encoder starting from the initial position s 1.
f a Has 3 different formulas:
wherein W is a Is the weight of the attention modelAnd (5) a heavy matrix.
At each time step, the model will infer a variable length pair Ji Quanchong vector based on the current target state and all source states, then based on a t (s) calculating the global context vector as a weighted average over all source states.
Hidden layer t moment state h t And context vector s t The information of the two vectors is combined to generate the following decoder's attention-hiding state:
wherein the method comprises the steps ofRepresenting a new attention hiding state vector, +.>The fully connected matrix representing the attention model weights, u representing the number of attention mechanism hidden units.
6. The method for classifying news based on a pooling convolutional embedding and attention-distributing neural network according to claim 5, wherein after introducing an attention mechanism, the final representation of the text is calculated as follows:
u t =tanh(W s h t +b s ) (6)
v=∑ t w t h t (8)
in the calculation process, W s Weight coefficient matrix representing attention model, h t Is a characteristic representation of convolution at time t, u t Is a hidden layer representation of the neural network, and u s Is a randomly initialized context vector, also known as a semantic representation of the input, w t Is byThe importance weight normalized by the Softmax function, v, is the final feature vector of the text.
7. The news classification method based on the pooling convolution embedding and attention distribution neural network according to claim 6, wherein the step 4 inputs the text vector of the test set in the corpus in the step 1 into the CNN, classifies the news class according to the trained model in the step 3, and calculates the accuracy of the news class, and specifically includes:
the model uses a leakage_ReLU activation function, introduces a leakage value in the negative half of the ReLU, and is therefore called a leakage ReLU function, unlike the ReLU, which assigns a non-zero slope to all negative values as follows;
a g is fixed, g represents a corresponding different route a g The method comprises the steps of carrying out a first treatment on the surface of the Finally, multi-classification is carried out through a Softmax classifier to obtain a result;
result=softmax(v) (10)
result is a vector whose dimensions are the number of categories, the number of each dimension being in the range of [0,1], which represents the probability that text falls into a category, the predicted category accuracy of the input sentence being:
prediction=argmax(result) (11)
the invention has the advantages and beneficial effects as follows:
the invention utilizes features and weights as key factors in the classification process. The mechanism is to first convert the news text into word vectors using an embedding layer, which are input into a convolution operation to extract local features. The pooling layer in a conventional convolutional network is deleted to reduce information loss, as the pooling layer acts to actually down-sample the inputs, as is commonly done by maximizing the output of each filter, thus ignoring some news information, as shown in claim 2. According to claim 4, the local feature vectors obtained after the pooling convolution are input into a global attention mechanism to redistribute weights, thereby obtaining global features of the text. Due to the risk of inactivation of neurons in the negative interval, the Leaky_ReLU is selected as an activation function, and finally, the accuracy of news classification is calculated through Softmax. In the conventional practice, due to the uniformity of the convolutional network, the influence of an internal pooling layer on information loss is often ignored when network optimization is performed. Aiming at the problem, the model provided by the patent captures local characteristics of the text, reduces information loss in the unified structure of the traditional neural network, and captures importance of each part of the text. Thus, processing the text classification problem in combination with a pooling-free convolutional network and attention weight distribution can significantly improve the accuracy of the news classification.
Drawings
FIG. 1 is a block diagram of a method for news classification based on a pooled convolutional embedded and attention distributed neural network in accordance with a preferred embodiment of the present invention;
FIG. 2 is a pooling-free convolution embedding and attention-distributing neural network model.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and specifically described below with reference to the drawings in the embodiments of the present invention. The described embodiments are only a few embodiments of the present invention.
The technical scheme for solving the technical problems is as follows:
in the present invention, as shown in fig. 1, a one-dimensional convolution operation is first used, and in the convolution network, a pooling layer is cancelled to reduce information loss, so that semantic features and position invariant features of an input sequence are extracted. The semantic features are then used as inputs to the attention mechanism to obtain global features for reassigning weights. The global feature vector is input to the fully connected layers and classified by the activation functions leak_relu and Softmax.
Step 1: collecting news data sets, normalizing data set formats for Chinese news, and writing: "tag + '\t' +news" form. And randomly segmenting the data set, dividing the data set into a training set, a testing set and a verification set. The training set is used for training the news classification model, the verification set is used for verifying whether the model is reasonable, and the test set is used for testing the classification effect of the model.
Using the segmented news text words as the input of the word embedding layer to obtain the feature vector x of a group of words 0 ,x 1 ,x 2 ,...,x t . The feature vector is a language which can be identified by a computer. For text category labels, the size letters of the input language are specified, and each character is encoded using a 1-m encoding; then, the sequence of the character sequence vector is converted into a fixed length l 0 Exceeding the length l 0 All characters of (2) will be ignored and less than l 0 Will be filled with 0 later.
Step 2: after step 1 a temporal convolution module is added, which is a one-dimensional convolution operation. Convolutional neural network models have been widely used for image recognition, but also for text classification. CNN is a deep neural network, and mainly consists of an input layer, a hidden layer and an output layer. The input layer is responsible for analyzing the input variables. The hidden layer comprises a convolution layer and a pooling layer for learning features of the input information. The output layer is composed of fully connected layers.
Convolution operations of different proportions may extract more complex features of the text. The implementation of CNN is represented by the following formula.
h=[h 1 ,h 2 ,…,h n-k+1 ](3)
Where x represents an embedded word. Sigma denotes a filter whose function is to generate new features by convolution operations.Is a nonlinear function. h is a i Represents a feature obtained by a convolution operation, and h is a set of bits obtained by the convolution operationThe largest feature among the features. b represents a deviation term.
The present invention uses one-dimensional convolution and thus only convolves in the row direction. The downward arrow in the figure indicates that the convolution kernel moves from top to bottom. Further, the convolution step size is set to 3.h is a 1 ,h 2 ,h 3 Representing the features obtained by extraction. At h 1 ,h 2 ,h 3 Thereafter, the feature vector H is a feature representation of the entire sentence. That is, the convolution kernel k is convolved with the window vector at each location to generate a feature map H ε R of the input text length-m+1 . Each element H of the feature map H j Calculated as the following equation.
h j =f(σ j ⊙k+b) (4)
As a result, the multiplication of matrix elements is performed, b is the bias term, and f is the activation function.
When the model input is a discrete function delta (x) ∈ [1, l]And a discrete kernel function τ (x) ∈ [1, k ]]Where delta (x), tau (x) e R, if the step size is d, the convolution c (y) between delta (x) and tau (x) e 1, theta+1](wherein) The calculation is as follows:
x represents a word vector and n represents the number of news word vectors. b=k-d+1 is the offset constant. Similar to conventional convolutional neural networks used in computer vision, a module consists of a set of kernel functions τ ij (x) (we refer to as "weights" (i=1, 2, …, v, j=1, 2, …, w)) were parameterized. Each input delta i (x) Or output c j (y) are all referred to as "features", m and n representing the size of the input and output features. Output c j (y) is delta i (x) And τ ij (x) Is a convolution sum of (a) and (b).
The pooling convolution cancels the largest pooling layer in the CNN because pool operations may lose some semantic information. This new, successive higher-order feature representation is then incorporated into the attention mechanism.
Step 3: for the feature vectors obtained in step 2, an attention model is input for each word x 0 ,x 1 ,...,x n Are each represented in vector form and are input to a convolution unit to obtain an output h 0 ,h 1 ,…,h n This output serves as input source=h for the attention mechanism 0 ,h 1 ,…,h n A final feature vector of the text is calculated. In the attention mechanism, the hidden layer t moment state h t Is randomly initialized and updated as a parameter during training while giving the source-side context vector s t Source side context vector s t Calculated as a weighted sum of the individual inputs, calculated as follows:
wherein L represents news text length, a t (s) represents a variable length alignment vector,representing the hidden layer state of the encoder.
Context vector s t All concealment states of the encoder should be considered, in the attention mechanism part, by concealing the state h at the decoder t moment t Hiding state with each source of encoderComparison to generate variable length pair Ji Xiangliang a t (s):/>
f a Is a function based on the content of the content,representing the decoder t moment hidden state h t Source hidden state with encoder->Function of->A content function representing the decoder t moment concealment state and all source concealment states of the encoder starting from the initial position s 1.
f a Has 3 different formulas:
wherein W is a Is a weight matrix of the attention model.
At each time step, the model will infer a variable length pair Ji Quanchong vector based on the current target state and all source states, then based on a t (s) calculating the global context vector as a weighted average over all source states.
Hidden layer t moment state h t And context vector s t The information of the two vectors is combined to generate the following decoder's attention-hiding state:
wherein the method comprises the steps ofRepresenting a new attention hiding state vector, +.>The fully connected matrix representing the attention model weights, u representing the number of attention mechanism hidden units.
After introducing the attention mechanism, the final representation of the text is calculated as follows:
u t =tanh(W s h t +b s ) (10)
v=∑ t w t h t (12)
in the calculation process, W s Weight coefficient matrix representing attention model, h t Is a characteristic representation of convolution at time t, u t Is a hidden layer representation of the neural network, and u s Is a randomly initialized context vector, which may also be referred to as a semantic representation of the input. w (w) t Is the importance weight normalized by the Softmax function. v is the final feature vector of the text.
Step 4: after step 3, the model uses the leak_relu activation function. The rectifying linear unit (ReLU) is the most commonly used activation function in neural networks and can be efficiently calculated. When the input is positive, the derivative is not zero, allowing gradient-based learning. However, when the input value of ReLU is negative, the output is still 0, and the first derivative is also 0. This will prevent the neuron from updating the parameters so that the neuron will not learn. This phenomenon is known as "dead neurons".
ReLU also produces many variants. In the present invention, to overcome the drawbacks of the ReLU, a leakage value is introduced in the negative half of the ReLU, and is therefore referred to as the leakage ReLU function. Unlike ReLU, lrehu assigns a non-zero slope for all negative values as follows;
a g is fixed, g represents a corresponding different route a g The method comprises the steps of carrying out a first treatment on the surface of the The leak_relu function is a variant of the classical (widely used) ReLU activation function. Since the derivative is always non-zero, the number of silent neurons can be reduced, ensuring that gradient-based continuous learning continues after entering the negative interval.
Finally, multiple classifications were performed by Softmax classifier to obtain results.
result=softmax(v) (14)
result is a vector whose dimension is the number of categories. The number of each dimension is in the range of 0,1, which represents the probability that the text falls into a certain category. The prediction category accuracy of the input sentence is:
prediction=argmax(result) (15)
the system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
The above examples should be understood as illustrative only and not limiting the scope of the invention. Various changes and modifications to the present invention may be made by one skilled in the art after reading the teachings herein, and such equivalent changes and modifications are intended to fall within the scope of the invention as defined in the appended claims.
Claims (4)
1. The news classification method based on the pooling convolution embedded and attention-distributed neural network is characterized by comprising the following steps of:
step 1: collecting a news text data set, carrying out standardized format processing and word segmentation on the news text, utilizing word embedding to obtain feature vectors of news, carrying out random segmentation on news according to news categories and news data on news labels, dividing the corpus into a training set, a testing set and a verification set, wherein the training set is used for training a news classification model, the verification set is used for verifying whether the model is reasonable or not, and the testing set is used for testing the classification effect of the model;
step 2: inputting the feature vector obtained by word embedding of the training set in the corpus in the step 1 into a CNN convolutional neural network, and canceling a pooling layer in the CNN;
step 3: inputting the feature vectors subjected to word embedding and pooling-free convolution in the step 2 into an attention mechanism, and carrying out weight redistribution on the feature vectors in the text so as to train a news classification model;
step 4: inputting the text vector of the test set in the corpus in the step 1 into CNN, classifying news categories according to the trained model in the step 3, and calculating the accuracy of the news categories;
the step 2 is to use the word vector x of the training set in the corpus in the step 1 0 ,x 1 ,x 2 ,...,x n Inputting CNN, canceling a pooling layer of a character convolution network, wherein the pooling layer comprises the following specific steps: inputting the word vector which is subjected to distributed representation into a one-dimensional convolution network, wherein the network comprises an input layer, a convolution layer and an output layer, a pooling layer of the convolution neural network is canceled to maximally reserve text characteristics, and the one-dimensional convolution is calculated to obtain the convolution sum of a discrete function and a discrete kernel function:
wherein τ (x) is a discrete kernel function, the input discrete function is δ (x), d is a step size, b is a bias term, where x represents a word vector and n represents the number of news word vectors;
the b=k-d+1 is an offset constant, and is defined by a set of kernel functions τ ij (x) Parameterizing i=1, 2, …, v, j=1, 2, …, w, each input δ i (x) Or output c j (y) are all called "features", m and n represent the size of the input and output features, output c j (y) is delta i (x) And τ ij (x) Is a convolution sum of (2);
step 4, inputting the text vector of the test set in the corpus in step 1 into CNN, classifying news categories according to the trained model in step 3, and calculating the accuracy of the news categories, wherein the method specifically comprises the following steps:
the model uses a leakage_ReLU activation function, which introduces a leakage value in the negative half of the ReLU, thus being called a leakage_ReLU function, unlike ReLU, which assigns a non-zero slope to all negative values as follows;
a g is fixed, g represents corresponding to different routes; finally, multi-classification is carried out through a Softmax classifier to obtain a result;
result=softmax(v) (10)
result is a vector whose dimensions are the number of categories, the number of each dimension being in the range of [0,1], which represents the probability that text falls into a category, the predicted category accuracy of the input sentence being:
prediction=argmax(result) (11)。
2. a method for classifying news based on a pooling convolutional embedded and attention-distributed neural network according to claim 1, wherein said step 1: the news data set is collected, and for Chinese news, the format of the data set is normalized, and the format is as follows: the 'tag +' \t '+news' form is that a word-segmented news text word is used as the input of a word-segmentation layer to obtain a feature vector x of a group of words 0 ,x 1 ,x 2 ,...,x t The feature vector is a language which can be identified by a computer, and for text category labels, the size letters of the input language are specified, and each character is encoded by using 1-1024 codes; then, the sequence of the character sequence vector is converted into a fixed length l 0 Exceeding the length l 0 All characters of (2) will be ignored and less than l 0 Will be filled with 0 later.
3. The news classification method based on the pooling convolution embedding and attention distribution neural network according to claim 1, wherein the step 3 inputs the feature vectors after word embedding and pooling convolution in the step 2 into an attention mechanism, and performs weight redistribution on the feature vectors in the text so as to train a news classification model, and specifically comprises the following steps:
for the feature vectors obtained in step 2, an attention model is input, each feature vector x 0 ,x 1 ,...,x n Are each represented in vector form and are input to a convolution unit to obtain an output h 0 ,h 1 ,…,h n This output serves as input source=h for the attention mechanism 0 ,h 1 ,…,h n A final feature vector of the text is calculated. In the attention mechanism, the hidden layer t moment state h t Is randomly initialized and serves as a reference during trainingThe number is updated while giving the source side context vector s t Source side context vector s t Calculated as a weighted sum of the individual inputs, calculated as follows:
wherein L represents news text length, a t (s) represents a variable length alignment vector,representing a hidden layer state of the encoder;
context vector s t All concealment states of the encoder should be considered, in the attention mechanism part, by concealing the state h at the decoder t moment t Hiding state with each source of encoderComparison to generate variable length pair Ji Xiangliang a t (s):
f a Is a function based on the content of the content,representing the decoder t moment hidden state h t Source hidden state with encoder->Function of->A content function representing the decoder t moment concealment state and all source concealment states of the encoder starting from the initial position s 1;
f a has 3 different formulas:
wherein W is a Is a weight matrix of the attention model;
at each time step, the model will infer a variable length pair Ji Quanchong vector based on the current target state and all source states, then based on a t (s) calculating the global context vector as a weighted average over all source states;
hidden layer t moment state h t And context vector s t The information of the two vectors is combined to generate the following decoder's attention-hiding state:
4. A method of news classification based on a pooled convolutional embedded and attention distributed neural network according to claim 3, characterized in that after introducing the attention mechanism, the final representation of the text is calculated as follows:
u t =tanh(W s h t +b s ) (6)
v=∑ t w t h t (8)
in the calculation process, W s Weight coefficient matrix representing attention model, h t Is a characteristic representation of convolution at time t, u t Is a hidden layer representation of the neural network, and u s Is a randomly initialized context vector, also known as a semantic representation of the input, w t Is the importance weight normalized by the Softmax function, v is the final feature vector of the text.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011443363.4A CN112527959B (en) | 2020-12-11 | 2020-12-11 | News classification method based on pooling convolution embedding and attention distribution neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011443363.4A CN112527959B (en) | 2020-12-11 | 2020-12-11 | News classification method based on pooling convolution embedding and attention distribution neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112527959A CN112527959A (en) | 2021-03-19 |
CN112527959B true CN112527959B (en) | 2023-05-30 |
Family
ID=75000138
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011443363.4A Active CN112527959B (en) | 2020-12-11 | 2020-12-11 | News classification method based on pooling convolution embedding and attention distribution neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112527959B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113177110B (en) * | 2021-05-28 | 2022-09-16 | 中国人民解放军国防科技大学 | False news detection method and device, computer equipment and storage medium |
CN114334159B (en) * | 2022-03-16 | 2022-06-17 | 四川大学华西医院 | Postoperative risk prediction natural language data enhancement model and method |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110772268A (en) * | 2019-11-01 | 2020-02-11 | 哈尔滨理工大学 | Multimode electroencephalogram signal and 1DCNN migration driving fatigue state identification method |
CN111292305A (en) * | 2020-01-22 | 2020-06-16 | 重庆大学 | Improved YOLO-V3 metal processing surface defect detection method |
CN111783688A (en) * | 2020-07-02 | 2020-10-16 | 吉林大学 | Remote sensing image scene classification method based on convolutional neural network |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180329884A1 (en) * | 2017-05-12 | 2018-11-15 | Rsvp Technologies Inc. | Neural contextual conversation learning |
CN109492108B (en) * | 2018-11-22 | 2020-12-15 | 上海唯识律简信息科技有限公司 | Deep learning-based multi-level fusion document classification method and system |
CN109597891B (en) * | 2018-11-26 | 2023-04-07 | 重庆邮电大学 | Text emotion analysis method based on bidirectional long-and-short-term memory neural network |
US11494615B2 (en) * | 2019-03-28 | 2022-11-08 | Baidu Usa Llc | Systems and methods for deep skip-gram network based text classification |
CN110188194B (en) * | 2019-04-26 | 2020-12-01 | 哈尔滨工业大学(深圳) | False news detection method and system based on multitask learning model |
CN110609897B (en) * | 2019-08-12 | 2023-08-04 | 北京化工大学 | Multi-category Chinese text classification method integrating global and local features |
-
2020
- 2020-12-11 CN CN202011443363.4A patent/CN112527959B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110772268A (en) * | 2019-11-01 | 2020-02-11 | 哈尔滨理工大学 | Multimode electroencephalogram signal and 1DCNN migration driving fatigue state identification method |
CN111292305A (en) * | 2020-01-22 | 2020-06-16 | 重庆大学 | Improved YOLO-V3 metal processing surface defect detection method |
CN111783688A (en) * | 2020-07-02 | 2020-10-16 | 吉林大学 | Remote sensing image scene classification method based on convolutional neural network |
Non-Patent Citations (4)
Title |
---|
Jiawen Yang 等.IEEE Access.2018,65130 - 65138. * |
Lung Pattern Classification for Interstitial Lung Diseases Using a Deep Convolutional Neural Network;Marios Anthimopoulos 等;IEEE Transactions on Medical Imaging(第5期);1207 - 1216 * |
基于深度学习的医疗文本信息抽取;涂文博;中国优秀硕士学位论文全文数据库 医药卫生科技辑(第1期);E054-85 * |
采用最少门单元结构的改进注意力声学模型;龙星延 等;信号处理(第06期);739-748 * |
Also Published As
Publication number | Publication date |
---|---|
CN112527959A (en) | 2021-03-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107526785B (en) | Text classification method and device | |
CN112329680B (en) | Semi-supervised remote sensing image target detection and segmentation method based on class activation graph | |
CN110069709B (en) | Intention recognition method, device, computer readable medium and electronic equipment | |
WO2023134084A1 (en) | Multi-label identification method and apparatus, electronic device, and storage medium | |
US11900250B2 (en) | Deep learning model for learning program embeddings | |
CN112417153B (en) | Text classification method, apparatus, terminal device and readable storage medium | |
CN112527959B (en) | News classification method based on pooling convolution embedding and attention distribution neural network | |
CN111475622A (en) | Text classification method, device, terminal and storage medium | |
CN112749274B (en) | Chinese text classification method based on attention mechanism and interference word deletion | |
CN113837370A (en) | Method and apparatus for training a model based on contrast learning | |
CN115408525B (en) | Letters and interviews text classification method, device, equipment and medium based on multi-level label | |
CN113553510B (en) | Text information recommendation method and device and readable medium | |
CN114139676A (en) | Training method of domain adaptive neural network | |
CN115035418A (en) | Remote sensing image semantic segmentation method and system based on improved deep LabV3+ network | |
CN115456043A (en) | Classification model processing method, intent recognition method, device and computer equipment | |
CN112749737A (en) | Image classification method and device, electronic equipment and storage medium | |
CN109597982B (en) | Abstract text recognition method and device | |
CN116824583A (en) | Weak supervision video scene graph generation method and device and electronic equipment | |
CN113704466B (en) | Text multi-label classification method and device based on iterative network and electronic equipment | |
CN112989052B (en) | Chinese news long text classification method based on combination-convolution neural network | |
CN115186670A (en) | Method and system for identifying domain named entities based on active learning | |
CN115700555A (en) | Model training method, prediction method, device and electronic equipment | |
CN113627192A (en) | Relation extraction method and device based on two-layer convolutional neural network | |
Deebadi | Understanding Impact of Twitter Feed on Bitcoin Price and Trading Patterns | |
CN112183103A (en) | Convolutional neural network entity relationship extraction method fusing different pre-training word vectors |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |