CN112527959B - News classification method based on pooling convolution embedding and attention distribution neural network - Google Patents

News classification method based on pooling convolution embedding and attention distribution neural network Download PDF

Info

Publication number
CN112527959B
CN112527959B CN202011443363.4A CN202011443363A CN112527959B CN 112527959 B CN112527959 B CN 112527959B CN 202011443363 A CN202011443363 A CN 202011443363A CN 112527959 B CN112527959 B CN 112527959B
Authority
CN
China
Prior art keywords
news
text
vector
attention
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011443363.4A
Other languages
Chinese (zh)
Other versions
CN112527959A (en
Inventor
唐贤伦
郝博慧
彭德光
钟冰
闫振甫
王会明
张璞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202011443363.4A priority Critical patent/CN112527959B/en
Publication of CN112527959A publication Critical patent/CN112527959A/en
Application granted granted Critical
Publication of CN112527959B publication Critical patent/CN112527959B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a news classification method based on pooling convolution embedded and attention distributed neural network, which uses characteristics and weights as key factors in the classification process. The mechanism is to use a convolution in the embedded layer to extract local features, delete the pooling layer to reduce information loss, and then add an attention mechanism to reassign weights to obtain global features of the text. The model captures not only the profound features of the text, but also the importance of the news parts. Convolutional Neural Networks (CNNs) play an important role in text classification tasks due to their advantages in extracting local features and location invariant features. The attention mechanism strengthens the weight of key information due to the fact that the attention mechanism extracts text context information and focuses on the characteristics of important parts, and the attention mechanism and the text context information are combined with stronger feature extraction capability. Combining the pooling-free CNN with the global attention mechanism to handle news classification problems can significantly improve the accuracy of text classification.

Description

News classification method based on pooling convolution embedding and attention distribution neural network
Technical Field
The invention belongs to a Chinese news text classification method, and particularly relates to a news classification method based on pooling-free convolution embedding and attention-distributed neural network.
Background
Text classification is a classical task of NLP. It assigns its corresponding tag to the specified text. Currently, text classification methods are mainly classified into conventional machine-learning text classification and deep-learning text classification.
Traditional machine-learning text classification methods include K nearest neighbor (KNN, K-nearest neighbor), maximum entropy (The Maximum Entropy), support vector machine (SVM, support Vector Machines), and the like. The core idea of the KNN algorithm is that if k samples out of the k nearest neighbors in most feature space belong to a certain class, the samples also belong to this class and share features with the samples in this class. The different categories are determined by the number of nearest neighbors, so it is appropriate for the sample size in the training dataset. The principle of maximum entropy is that the model with the maximum entropy is the best model when learning the probabilistic model. That is, the maximum entropy can also be understood as a model of the maximum entropy selected among a set of models satisfying the constraint. SVM is a generalized linear classifier for binary classification of data by supervised learning. Deep learning algorithms are now beginning to be widely used for text classification. A Recurrent Neural Network (RNN) is a time series-based neural network model that can capture long-term dependencies between sequences. However, as the length of the sequence increases, it is difficult for a standard RNN to obtain long-term dependencies, and thus it is difficult to model the entire sequence. During modeling, some information may be lost and there are problems with gradient extinction and gradient explosion. Convolutional Neural Networks (CNNs) are also applied to text classification tasks, which have great advantages in capturing local features and location invariant features. The use of long short memory networks (LSTM) can simulate the relationships between sentences. The LSTM is added with three gate structures based on RNN, so that the problems of gradient disappearance and gradient explosion are solved. In contrast to LSTM, the Gated Recursive Unit (GRU) has only two gate structures, namely an update gate and a reset gate. Thus, the GRU has fewer parameters and better convergence during training. Also, the hierarchical attention model incorporates the attention mechanism into the hierarchical GRU model so that the model can better capture important information of the document. In recent years, attention mechanisms have been widely used in the field of text classification because it can distinguish the importance of each word to the classification result.
Since a computer cannot directly process a text sequence, it is important to express the text in a form that the computer can understand (called text vectorization).
The invention aims at solving the problems that the input text semantic information is insufficient, and the pooling layer can cause information loss and the classification precision is reduced.
Disclosure of Invention
The present invention is directed to solving the above problems of the prior art. A news classification method based on pooling convolution embedding and attention distribution neural network is provided. The technical scheme of the invention is as follows:
a method of news classification based on a pooling convolutional embedding and attention-distributing neural network, comprising the steps of:
step 1: collecting a news text data set, carrying out standardized format processing and word segmentation on the news text, utilizing word embedding to obtain feature vectors of news, carrying out random segmentation on news according to news categories and news data on news labels, dividing the corpus into a training set, a testing set and a verification set, wherein the training set is used for training a news classification model, the verification set is used for verifying whether the model is reasonable or not, and the testing set is used for testing the classification effect of the model;
step 2: inputting the feature vector obtained by word embedding of the training set in the corpus in the step 1 into a CNN convolutional neural network, and canceling a pooling layer in the CNN;
step 3: inputting the feature vectors subjected to word embedding and pooling-free convolution in the step 2 into an attention mechanism, and carrying out weight redistribution on the feature vectors in the text so as to train a news classification model;
step 4: inputting the text vector of the test set in the corpus in the step 1 into CNN, classifying news categories according to the trained model in the step 3, and calculating the accuracy of the news categories.
2. A method for classifying news based on a pooling convolutional embedded and attention-distributed neural network according to claim 1, wherein said step 1: the news data set is collected, and for Chinese news, the format of the data set is normalized, and the format is as follows: the 'tag +' \t '+news' form is that a word-segmented news text word is used as the input of a word-segmentation layer to obtain a feature vector x of a group of words 0 ,x 1 ,x 2 ,...,x t . The feature vector is a language which can be identified by a computer. For text category labels, the size letters of the input language are specified, and each character is encoded using a 1-m encoding; then, the sequence of the character sequence vector is converted into a fixed length l 0 Exceeding the length l 0 All characters of (2) will be ignored and less than l 0 Will be filled with 0 later.
3. The news classifying method based on pooling convolution embedding and attention distribution neural network according to claim 2, wherein said step 2 is to use word vector x of training set in corpus in step 1 0 ,x 1 ,x 2 ,...,x n Inputting CNN, canceling a pooling layer of a character convolution network, wherein the pooling layer comprises the following specific steps: inputting the word vector with distributed representation into one-dimensional convolution network, wherein the network comprises an input layer, a convolution layer, an output layer, a pooling layer of the convolution neural network for maximizing the preservation of text characteristics, and one-dimensional convolution calculationObtaining the sum of the convolution of the discrete function and the discrete kernel function:
Figure BDA0002830754810000031
where τ (x) is a discrete kernel function, the input discrete function is δ (x), d is a step size, b is a bias term, where x represents a word vector and n represents the number of news word vectors.
4. A method of news classification based on a pooling convolutional embedding and attention-distributing neural network according to claim 3, characterized in that said b = k-d +1 is an offset constant, defined by a set of kernel functions τ ij (x) Parameterizing i=1, 2, …, v, j=1, 2, …, w, each input δ i (x) Or output c j (y) are all called "features", m and n represent the size of the input and output features, output c j (y) is delta i (x) And τ ij (x) Is a convolution sum of (a) and (b).
5. The news classification method based on the pooling convolution embedding and attention distribution neural network according to claim 4, wherein the step 3 inputs the feature vectors after word embedding and pooling convolution in the step 2 into an attention mechanism, and performs weight redistribution on the feature vectors in the text so as to train a news classification model, and specifically comprises the following steps:
for the feature vectors obtained in step 2, an attention model is input for each word x 0 ,x 1 ,...,x n Are each represented in vector form and are input to a convolution unit to obtain an output h 0 ,h 1 ,…,h n This output serves as input source=h for the attention mechanism 0 ,h 1 ,…,h n A final feature vector of the text is calculated. In the attention mechanism, the hidden layer t moment state h t Is randomly initialized and updated as a parameter during training while giving the source-side context vector s t Source side context vector s t Calculated as a weighted sum of the individual inputs, calculated as follows:
Figure BDA0002830754810000041
wherein L represents news text length, a t (s) represents a variable length alignment vector,
Figure BDA0002830754810000042
representing the hidden layer state of the encoder. />
Context vector s t All concealment states of the encoder should be considered, in the attention mechanism part, by concealing the state h at the decoder t moment t Hiding state with each source of encoder
Figure BDA0002830754810000043
Comparison to generate variable length pair Ji Xiangliang a t (s):
Figure BDA0002830754810000044
f a Is a function based on the content of the content,
Figure BDA0002830754810000045
representing the decoder t moment hidden state h t Source hidden state with encoder->
Figure BDA0002830754810000046
Function of->
Figure BDA0002830754810000047
A content function representing the decoder t moment concealment state and all source concealment states of the encoder starting from the initial position s 1.
f a Has 3 different formulas:
Figure BDA0002830754810000048
wherein W is a Is the weight of the attention modelAnd (5) a heavy matrix.
At each time step, the model will infer a variable length pair Ji Quanchong vector based on the current target state and all source states, then based on a t (s) calculating the global context vector as a weighted average over all source states.
Hidden layer t moment state h t And context vector s t The information of the two vectors is combined to generate the following decoder's attention-hiding state:
Figure BDA0002830754810000049
wherein the method comprises the steps of
Figure BDA00028307548100000410
Representing a new attention hiding state vector, +.>
Figure BDA00028307548100000411
The fully connected matrix representing the attention model weights, u representing the number of attention mechanism hidden units.
6. The method for classifying news based on a pooling convolutional embedding and attention-distributing neural network according to claim 5, wherein after introducing an attention mechanism, the final representation of the text is calculated as follows:
u t =tanh(W s h t +b s ) (6)
Figure BDA0002830754810000051
v=∑ t w t h t (8)
in the calculation process, W s Weight coefficient matrix representing attention model, h t Is a characteristic representation of convolution at time t, u t Is a hidden layer representation of the neural network, and u s Is a randomly initialized context vector, also known as a semantic representation of the input, w t Is byThe importance weight normalized by the Softmax function, v, is the final feature vector of the text.
7. The news classification method based on the pooling convolution embedding and attention distribution neural network according to claim 6, wherein the step 4 inputs the text vector of the test set in the corpus in the step 1 into the CNN, classifies the news class according to the trained model in the step 3, and calculates the accuracy of the news class, and specifically includes:
the model uses a leakage_ReLU activation function, introduces a leakage value in the negative half of the ReLU, and is therefore called a leakage ReLU function, unlike the ReLU, which assigns a non-zero slope to all negative values as follows;
Figure BDA0002830754810000052
a g is fixed, g represents a corresponding different route a g The method comprises the steps of carrying out a first treatment on the surface of the Finally, multi-classification is carried out through a Softmax classifier to obtain a result;
result=softmax(v) (10)
result is a vector whose dimensions are the number of categories, the number of each dimension being in the range of [0,1], which represents the probability that text falls into a category, the predicted category accuracy of the input sentence being:
prediction=argmax(result) (11)
the invention has the advantages and beneficial effects as follows:
the invention utilizes features and weights as key factors in the classification process. The mechanism is to first convert the news text into word vectors using an embedding layer, which are input into a convolution operation to extract local features. The pooling layer in a conventional convolutional network is deleted to reduce information loss, as the pooling layer acts to actually down-sample the inputs, as is commonly done by maximizing the output of each filter, thus ignoring some news information, as shown in claim 2. According to claim 4, the local feature vectors obtained after the pooling convolution are input into a global attention mechanism to redistribute weights, thereby obtaining global features of the text. Due to the risk of inactivation of neurons in the negative interval, the Leaky_ReLU is selected as an activation function, and finally, the accuracy of news classification is calculated through Softmax. In the conventional practice, due to the uniformity of the convolutional network, the influence of an internal pooling layer on information loss is often ignored when network optimization is performed. Aiming at the problem, the model provided by the patent captures local characteristics of the text, reduces information loss in the unified structure of the traditional neural network, and captures importance of each part of the text. Thus, processing the text classification problem in combination with a pooling-free convolutional network and attention weight distribution can significantly improve the accuracy of the news classification.
Drawings
FIG. 1 is a block diagram of a method for news classification based on a pooled convolutional embedded and attention distributed neural network in accordance with a preferred embodiment of the present invention;
FIG. 2 is a pooling-free convolution embedding and attention-distributing neural network model.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and specifically described below with reference to the drawings in the embodiments of the present invention. The described embodiments are only a few embodiments of the present invention.
The technical scheme for solving the technical problems is as follows:
in the present invention, as shown in fig. 1, a one-dimensional convolution operation is first used, and in the convolution network, a pooling layer is cancelled to reduce information loss, so that semantic features and position invariant features of an input sequence are extracted. The semantic features are then used as inputs to the attention mechanism to obtain global features for reassigning weights. The global feature vector is input to the fully connected layers and classified by the activation functions leak_relu and Softmax.
Step 1: collecting news data sets, normalizing data set formats for Chinese news, and writing: "tag + '\t' +news" form. And randomly segmenting the data set, dividing the data set into a training set, a testing set and a verification set. The training set is used for training the news classification model, the verification set is used for verifying whether the model is reasonable, and the test set is used for testing the classification effect of the model.
Using the segmented news text words as the input of the word embedding layer to obtain the feature vector x of a group of words 0 ,x 1 ,x 2 ,...,x t . The feature vector is a language which can be identified by a computer. For text category labels, the size letters of the input language are specified, and each character is encoded using a 1-m encoding; then, the sequence of the character sequence vector is converted into a fixed length l 0 Exceeding the length l 0 All characters of (2) will be ignored and less than l 0 Will be filled with 0 later.
Step 2: after step 1 a temporal convolution module is added, which is a one-dimensional convolution operation. Convolutional neural network models have been widely used for image recognition, but also for text classification. CNN is a deep neural network, and mainly consists of an input layer, a hidden layer and an output layer. The input layer is responsible for analyzing the input variables. The hidden layer comprises a convolution layer and a pooling layer for learning features of the input information. The output layer is composed of fully connected layers.
Convolution operations of different proportions may extract more complex features of the text. The implementation of CNN is represented by the following formula.
Figure BDA0002830754810000071
Figure BDA0002830754810000072
h=[h 1 ,h 2 ,…,h n-k+1 ](3)
Where x represents an embedded word. Sigma denotes a filter whose function is to generate new features by convolution operations.
Figure BDA0002830754810000073
Is a nonlinear function. h is a i Represents a feature obtained by a convolution operation, and h is a set of bits obtained by the convolution operationThe largest feature among the features. b represents a deviation term.
The present invention uses one-dimensional convolution and thus only convolves in the row direction. The downward arrow in the figure indicates that the convolution kernel moves from top to bottom. Further, the convolution step size is set to 3.h is a 1 ,h 2 ,h 3 Representing the features obtained by extraction. At h 1 ,h 2 ,h 3 Thereafter, the feature vector H is a feature representation of the entire sentence. That is, the convolution kernel k is convolved with the window vector at each location to generate a feature map H ε R of the input text length-m+1 . Each element H of the feature map H j Calculated as the following equation.
h j =f(σ j ⊙k+b) (4)
As a result, the multiplication of matrix elements is performed, b is the bias term, and f is the activation function.
When the model input is a discrete function delta (x) ∈ [1, l]And a discrete kernel function τ (x) ∈ [1, k ]]Where delta (x), tau (x) e R, if the step size is d, the convolution c (y) between delta (x) and tau (x) e 1, theta+1](wherein
Figure BDA0002830754810000081
) The calculation is as follows:
Figure BDA0002830754810000082
x represents a word vector and n represents the number of news word vectors. b=k-d+1 is the offset constant. Similar to conventional convolutional neural networks used in computer vision, a module consists of a set of kernel functions τ ij (x) (we refer to as "weights" (i=1, 2, …, v, j=1, 2, …, w)) were parameterized. Each input delta i (x) Or output c j (y) are all referred to as "features", m and n representing the size of the input and output features. Output c j (y) is delta i (x) And τ ij (x) Is a convolution sum of (a) and (b).
The pooling convolution cancels the largest pooling layer in the CNN because pool operations may lose some semantic information. This new, successive higher-order feature representation is then incorporated into the attention mechanism.
Step 3: for the feature vectors obtained in step 2, an attention model is input for each word x 0 ,x 1 ,...,x n Are each represented in vector form and are input to a convolution unit to obtain an output h 0 ,h 1 ,…,h n This output serves as input source=h for the attention mechanism 0 ,h 1 ,…,h n A final feature vector of the text is calculated. In the attention mechanism, the hidden layer t moment state h t Is randomly initialized and updated as a parameter during training while giving the source-side context vector s t Source side context vector s t Calculated as a weighted sum of the individual inputs, calculated as follows:
Figure BDA0002830754810000083
wherein L represents news text length, a t (s) represents a variable length alignment vector,
Figure BDA0002830754810000084
representing the hidden layer state of the encoder.
Context vector s t All concealment states of the encoder should be considered, in the attention mechanism part, by concealing the state h at the decoder t moment t Hiding state with each source of encoder
Figure BDA0002830754810000085
Comparison to generate variable length pair Ji Xiangliang a t (s):/>
Figure BDA0002830754810000086
f a Is a function based on the content of the content,
Figure BDA0002830754810000091
representing the decoder t moment hidden state h t Source hidden state with encoder->
Figure BDA0002830754810000092
Function of->
Figure BDA0002830754810000093
A content function representing the decoder t moment concealment state and all source concealment states of the encoder starting from the initial position s 1.
f a Has 3 different formulas:
Figure BDA0002830754810000094
wherein W is a Is a weight matrix of the attention model.
At each time step, the model will infer a variable length pair Ji Quanchong vector based on the current target state and all source states, then based on a t (s) calculating the global context vector as a weighted average over all source states.
Hidden layer t moment state h t And context vector s t The information of the two vectors is combined to generate the following decoder's attention-hiding state:
Figure BDA0002830754810000095
wherein the method comprises the steps of
Figure BDA0002830754810000096
Representing a new attention hiding state vector, +.>
Figure BDA0002830754810000097
The fully connected matrix representing the attention model weights, u representing the number of attention mechanism hidden units.
After introducing the attention mechanism, the final representation of the text is calculated as follows:
u t =tanh(W s h t +b s ) (10)
Figure BDA0002830754810000098
v=∑ t w t h t (12)
in the calculation process, W s Weight coefficient matrix representing attention model, h t Is a characteristic representation of convolution at time t, u t Is a hidden layer representation of the neural network, and u s Is a randomly initialized context vector, which may also be referred to as a semantic representation of the input. w (w) t Is the importance weight normalized by the Softmax function. v is the final feature vector of the text.
Step 4: after step 3, the model uses the leak_relu activation function. The rectifying linear unit (ReLU) is the most commonly used activation function in neural networks and can be efficiently calculated. When the input is positive, the derivative is not zero, allowing gradient-based learning. However, when the input value of ReLU is negative, the output is still 0, and the first derivative is also 0. This will prevent the neuron from updating the parameters so that the neuron will not learn. This phenomenon is known as "dead neurons".
ReLU also produces many variants. In the present invention, to overcome the drawbacks of the ReLU, a leakage value is introduced in the negative half of the ReLU, and is therefore referred to as the leakage ReLU function. Unlike ReLU, lrehu assigns a non-zero slope for all negative values as follows;
Figure BDA0002830754810000101
a g is fixed, g represents a corresponding different route a g The method comprises the steps of carrying out a first treatment on the surface of the The leak_relu function is a variant of the classical (widely used) ReLU activation function. Since the derivative is always non-zero, the number of silent neurons can be reduced, ensuring that gradient-based continuous learning continues after entering the negative interval.
Finally, multiple classifications were performed by Softmax classifier to obtain results.
result=softmax(v) (14)
result is a vector whose dimension is the number of categories. The number of each dimension is in the range of 0,1, which represents the probability that the text falls into a certain category. The prediction category accuracy of the input sentence is:
prediction=argmax(result) (15)
the system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
The above examples should be understood as illustrative only and not limiting the scope of the invention. Various changes and modifications to the present invention may be made by one skilled in the art after reading the teachings herein, and such equivalent changes and modifications are intended to fall within the scope of the invention as defined in the appended claims.

Claims (4)

1. The news classification method based on the pooling convolution embedded and attention-distributed neural network is characterized by comprising the following steps of:
step 1: collecting a news text data set, carrying out standardized format processing and word segmentation on the news text, utilizing word embedding to obtain feature vectors of news, carrying out random segmentation on news according to news categories and news data on news labels, dividing the corpus into a training set, a testing set and a verification set, wherein the training set is used for training a news classification model, the verification set is used for verifying whether the model is reasonable or not, and the testing set is used for testing the classification effect of the model;
step 2: inputting the feature vector obtained by word embedding of the training set in the corpus in the step 1 into a CNN convolutional neural network, and canceling a pooling layer in the CNN;
step 3: inputting the feature vectors subjected to word embedding and pooling-free convolution in the step 2 into an attention mechanism, and carrying out weight redistribution on the feature vectors in the text so as to train a news classification model;
step 4: inputting the text vector of the test set in the corpus in the step 1 into CNN, classifying news categories according to the trained model in the step 3, and calculating the accuracy of the news categories;
the step 2 is to use the word vector x of the training set in the corpus in the step 1 0 ,x 1 ,x 2 ,...,x n Inputting CNN, canceling a pooling layer of a character convolution network, wherein the pooling layer comprises the following specific steps: inputting the word vector which is subjected to distributed representation into a one-dimensional convolution network, wherein the network comprises an input layer, a convolution layer and an output layer, a pooling layer of the convolution neural network is canceled to maximally reserve text characteristics, and the one-dimensional convolution is calculated to obtain the convolution sum of a discrete function and a discrete kernel function:
Figure QLYQS_1
wherein τ (x) is a discrete kernel function, the input discrete function is δ (x), d is a step size, b is a bias term, where x represents a word vector and n represents the number of news word vectors;
the b=k-d+1 is an offset constant, and is defined by a set of kernel functions τ ij (x) Parameterizing i=1, 2, …, v, j=1, 2, …, w, each input δ i (x) Or output c j (y) are all called "features", m and n represent the size of the input and output features, output c j (y) is delta i (x) And τ ij (x) Is a convolution sum of (2);
step 4, inputting the text vector of the test set in the corpus in step 1 into CNN, classifying news categories according to the trained model in step 3, and calculating the accuracy of the news categories, wherein the method specifically comprises the following steps:
the model uses a leakage_ReLU activation function, which introduces a leakage value in the negative half of the ReLU, thus being called a leakage_ReLU function, unlike ReLU, which assigns a non-zero slope to all negative values as follows;
Figure QLYQS_2
a g is fixed, g represents corresponding to different routes; finally, multi-classification is carried out through a Softmax classifier to obtain a result;
result=softmax(v) (10)
result is a vector whose dimensions are the number of categories, the number of each dimension being in the range of [0,1], which represents the probability that text falls into a category, the predicted category accuracy of the input sentence being:
prediction=argmax(result) (11)。
2. a method for classifying news based on a pooling convolutional embedded and attention-distributed neural network according to claim 1, wherein said step 1: the news data set is collected, and for Chinese news, the format of the data set is normalized, and the format is as follows: the 'tag +' \t '+news' form is that a word-segmented news text word is used as the input of a word-segmentation layer to obtain a feature vector x of a group of words 0 ,x 1 ,x 2 ,...,x t The feature vector is a language which can be identified by a computer, and for text category labels, the size letters of the input language are specified, and each character is encoded by using 1-1024 codes; then, the sequence of the character sequence vector is converted into a fixed length l 0 Exceeding the length l 0 All characters of (2) will be ignored and less than l 0 Will be filled with 0 later.
3. The news classification method based on the pooling convolution embedding and attention distribution neural network according to claim 1, wherein the step 3 inputs the feature vectors after word embedding and pooling convolution in the step 2 into an attention mechanism, and performs weight redistribution on the feature vectors in the text so as to train a news classification model, and specifically comprises the following steps:
for the feature vectors obtained in step 2, an attention model is input, each feature vector x 0 ,x 1 ,...,x n Are each represented in vector form and are input to a convolution unit to obtain an output h 0 ,h 1 ,…,h n This output serves as input source=h for the attention mechanism 0 ,h 1 ,…,h n A final feature vector of the text is calculated. In the attention mechanism, the hidden layer t moment state h t Is randomly initialized and serves as a reference during trainingThe number is updated while giving the source side context vector s t Source side context vector s t Calculated as a weighted sum of the individual inputs, calculated as follows:
Figure QLYQS_3
wherein L represents news text length, a t (s) represents a variable length alignment vector,
Figure QLYQS_4
representing a hidden layer state of the encoder;
context vector s t All concealment states of the encoder should be considered, in the attention mechanism part, by concealing the state h at the decoder t moment t Hiding state with each source of encoder
Figure QLYQS_5
Comparison to generate variable length pair Ji Xiangliang a t (s):
Figure QLYQS_6
f a Is a function based on the content of the content,
Figure QLYQS_7
representing the decoder t moment hidden state h t Source hidden state with encoder->
Figure QLYQS_8
Function of->
Figure QLYQS_9
A content function representing the decoder t moment concealment state and all source concealment states of the encoder starting from the initial position s 1;
f a has 3 different formulas:
Figure QLYQS_10
wherein W is a Is a weight matrix of the attention model;
at each time step, the model will infer a variable length pair Ji Quanchong vector based on the current target state and all source states, then based on a t (s) calculating the global context vector as a weighted average over all source states;
hidden layer t moment state h t And context vector s t The information of the two vectors is combined to generate the following decoder's attention-hiding state:
Figure QLYQS_11
wherein the method comprises the steps of
Figure QLYQS_12
Representing a new attention hiding state vector, +.>
Figure QLYQS_13
The fully connected matrix representing the attention model weights, u representing the number of attention mechanism hidden units.
4. A method of news classification based on a pooled convolutional embedded and attention distributed neural network according to claim 3, characterized in that after introducing the attention mechanism, the final representation of the text is calculated as follows:
u t =tanh(W s h t +b s ) (6)
Figure QLYQS_14
/>
v=∑ t w t h t (8)
in the calculation process, W s Weight coefficient matrix representing attention model, h t Is a characteristic representation of convolution at time t, u t Is a hidden layer representation of the neural network, and u s Is a randomly initialized context vector, also known as a semantic representation of the input, w t Is the importance weight normalized by the Softmax function, v is the final feature vector of the text.
CN202011443363.4A 2020-12-11 2020-12-11 News classification method based on pooling convolution embedding and attention distribution neural network Active CN112527959B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011443363.4A CN112527959B (en) 2020-12-11 2020-12-11 News classification method based on pooling convolution embedding and attention distribution neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011443363.4A CN112527959B (en) 2020-12-11 2020-12-11 News classification method based on pooling convolution embedding and attention distribution neural network

Publications (2)

Publication Number Publication Date
CN112527959A CN112527959A (en) 2021-03-19
CN112527959B true CN112527959B (en) 2023-05-30

Family

ID=75000138

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011443363.4A Active CN112527959B (en) 2020-12-11 2020-12-11 News classification method based on pooling convolution embedding and attention distribution neural network

Country Status (1)

Country Link
CN (1) CN112527959B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113177110B (en) * 2021-05-28 2022-09-16 中国人民解放军国防科技大学 False news detection method and device, computer equipment and storage medium
CN114334159B (en) * 2022-03-16 2022-06-17 四川大学华西医院 Postoperative risk prediction natural language data enhancement model and method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110772268A (en) * 2019-11-01 2020-02-11 哈尔滨理工大学 Multimode electroencephalogram signal and 1DCNN migration driving fatigue state identification method
CN111292305A (en) * 2020-01-22 2020-06-16 重庆大学 Improved YOLO-V3 metal processing surface defect detection method
CN111783688A (en) * 2020-07-02 2020-10-16 吉林大学 Remote sensing image scene classification method based on convolutional neural network

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180329884A1 (en) * 2017-05-12 2018-11-15 Rsvp Technologies Inc. Neural contextual conversation learning
CN109492108B (en) * 2018-11-22 2020-12-15 上海唯识律简信息科技有限公司 Deep learning-based multi-level fusion document classification method and system
CN109597891B (en) * 2018-11-26 2023-04-07 重庆邮电大学 Text emotion analysis method based on bidirectional long-and-short-term memory neural network
US11494615B2 (en) * 2019-03-28 2022-11-08 Baidu Usa Llc Systems and methods for deep skip-gram network based text classification
CN110188194B (en) * 2019-04-26 2020-12-01 哈尔滨工业大学(深圳) False news detection method and system based on multitask learning model
CN110609897B (en) * 2019-08-12 2023-08-04 北京化工大学 Multi-category Chinese text classification method integrating global and local features

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110772268A (en) * 2019-11-01 2020-02-11 哈尔滨理工大学 Multimode electroencephalogram signal and 1DCNN migration driving fatigue state identification method
CN111292305A (en) * 2020-01-22 2020-06-16 重庆大学 Improved YOLO-V3 metal processing surface defect detection method
CN111783688A (en) * 2020-07-02 2020-10-16 吉林大学 Remote sensing image scene classification method based on convolutional neural network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Jiawen Yang 等.IEEE Access.2018,65130 - 65138. *
Lung Pattern Classification for Interstitial Lung Diseases Using a Deep Convolutional Neural Network;Marios Anthimopoulos 等;IEEE Transactions on Medical Imaging(第5期);1207 - 1216 *
基于深度学习的医疗文本信息抽取;涂文博;中国优秀硕士学位论文全文数据库 医药卫生科技辑(第1期);E054-85 *
采用最少门单元结构的改进注意力声学模型;龙星延 等;信号处理(第06期);739-748 *

Also Published As

Publication number Publication date
CN112527959A (en) 2021-03-19

Similar Documents

Publication Publication Date Title
CN107526785B (en) Text classification method and device
CN112329680B (en) Semi-supervised remote sensing image target detection and segmentation method based on class activation graph
CN110069709B (en) Intention recognition method, device, computer readable medium and electronic equipment
WO2023134084A1 (en) Multi-label identification method and apparatus, electronic device, and storage medium
US11900250B2 (en) Deep learning model for learning program embeddings
CN112417153B (en) Text classification method, apparatus, terminal device and readable storage medium
CN112527959B (en) News classification method based on pooling convolution embedding and attention distribution neural network
CN111475622A (en) Text classification method, device, terminal and storage medium
CN112749274B (en) Chinese text classification method based on attention mechanism and interference word deletion
CN113837370A (en) Method and apparatus for training a model based on contrast learning
CN115408525B (en) Letters and interviews text classification method, device, equipment and medium based on multi-level label
CN113553510B (en) Text information recommendation method and device and readable medium
CN114139676A (en) Training method of domain adaptive neural network
CN115035418A (en) Remote sensing image semantic segmentation method and system based on improved deep LabV3+ network
CN115456043A (en) Classification model processing method, intent recognition method, device and computer equipment
CN112749737A (en) Image classification method and device, electronic equipment and storage medium
CN109597982B (en) Abstract text recognition method and device
CN116824583A (en) Weak supervision video scene graph generation method and device and electronic equipment
CN113704466B (en) Text multi-label classification method and device based on iterative network and electronic equipment
CN112989052B (en) Chinese news long text classification method based on combination-convolution neural network
CN115186670A (en) Method and system for identifying domain named entities based on active learning
CN115700555A (en) Model training method, prediction method, device and electronic equipment
CN113627192A (en) Relation extraction method and device based on two-layer convolutional neural network
Deebadi Understanding Impact of Twitter Feed on Bitcoin Price and Trading Patterns
CN112183103A (en) Convolutional neural network entity relationship extraction method fusing different pre-training word vectors

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant