CN111966828A - Newspaper and magazine news classification method based on text context structure and attribute information superposition network - Google Patents
Newspaper and magazine news classification method based on text context structure and attribute information superposition network Download PDFInfo
- Publication number
- CN111966828A CN111966828A CN202010729459.0A CN202010729459A CN111966828A CN 111966828 A CN111966828 A CN 111966828A CN 202010729459 A CN202010729459 A CN 202010729459A CN 111966828 A CN111966828 A CN 111966828A
- Authority
- CN
- China
- Prior art keywords
- text
- news
- vector
- weight
- attribute
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9532—Query formulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a newspaper and magazine news classification method based on a text context structure and attribute information superposition network, and belongs to the field of information processing. The invention uses a text vector representation method to convert the text with indefinite length into the vector with fixed length, thereby avoiding the loss and redundancy of text information; from the perspective of training data, weighted random sampling is adopted, and the composition of training samples is optimized through the possibility that weight adjusting samples are selected; from the aspect of feature extraction, the text optimizes the feature extraction process by considering not only the context result information but also the text attribute information. The method and the device not only improve the extraction mode of the text features, but also additionally incorporate the attribute features into the feature construction process. The text with indefinite length is converted into the vector with fixed length by using a text vector representation method, so that the loss and redundancy of text information are avoided, and the extraction mode of text characteristics is optimized; additionally, the characteristic information of news is added, and the source of the characteristics is enriched.
Description
Technical Field
The invention belongs to the field of information processing, and relates to a news classification method and a news classification system based on a text context structure information and attribute information overlay network.
Background
Definition of key terms:
a neural network: is a mathematical or computational model that mimics the structure and function of a biological neural network and is used to estimate or approximate functions. Neural networks are computed from a large number of artificial neuron connections. In most cases, the artificial neural network can change the internal structure on the basis of external information, and is an adaptive system.
Text characterization: the method is a machine learning technology for mapping a high-level cognitive abstract entity of text into a vector on a real number field in the field of natural language processing so as to facilitate subsequent computer processing.
Weighted random sampling: the sampling technology for determining the sampling probability of the sample according to the sample weight can effectively solve the problem of unbalanced distribution of the sample class from the sampling level.
Newspapers and periodicals are a kind of transmission medium for transmitting literal data by using paper. The main functions of the system comprise explanation, propaganda and image maintenance, for example, the 'daily newspaper of people' is the image of a maintenance country, the 'liberation military newspaper' is the image of a maintenance army, and the 'enterprise newspaper' is the image of a maintenance enterprise.
Generally, a newsstand has several news. For a certain news item in a newspaper, whether the news item can become the head edition news of the present day of the newspaper is related to the information content of the news item. In conjunction with current natural language processing techniques, it is still difficult to directly quantify the amount of information in a piece of textual news. Therefore, the problem of two classification black boxes (hereinafter, abbreviated as ' news classification problem of newspapers and periodicals ') of ' whether a certain news is the headline news ' is solved by using a neural network which is also the black box ', and the problem is a direct and efficient choice.
With the success of AlexNet, the study of neural networks entered a new phase. In the current text classification technology, a neural network is mainly used as a technical means, so that the structural information of the text is fully mined, and classification is performed based on the characteristic information. In the field of text classification, algorithms such as TextCNN, TextRNN, FastText, TextRCNN, etc. are proposed in succession, performing feature extraction on texts from the perspective of convolutional neural networks and recurrent neural networks, respectively, and these algorithms perform excellently on multiple test data sets.
The prior art has the following disadvantages:
although algorithms such as TextCNN, TextRNN, FastText, TextRCNN and the like are excellent in performance on many classified test data sets of open texts, the algorithms cannot effectively solve the problem of classification of the news of the newspapers due to the particularity of the news of the newspapers. Specifically, the following points are included: firstly, the length of news in a newspaper is uncertain, and the length of news is not directly related to the importance degree of news, and the above technical means mostly limit the input length of text, and require truncation or filling operations on the input text, which may cause the loss or redundancy of extracted features. Meanwhile, due to the particularity of the news of the newspapers, the data of two categories of the news classification problem of the newspapers are obviously biased, namely, the number of the first-edition news is far less than that of the non-first-edition news. The classification method directly trained by using the biased data is also biased, namely the classification method has a higher probability of classifying news into non-top news. Finally, for news in newspapers and periodicals, although the main reason for the possibility of becoming the top news is the amount of information contained between the contexts of news texts, some too long or too short news are difficult to become the top news due to the limitations of top layout and post typesetting. Most of the prior technical means only consider text context structure information, but ignore text attribute information such as 'title length', 'text length' and the like, so that the characteristics are lost.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a newspaper and magazine news classification method based on a text context structure and an attribute information superposition network. In the input aspect, the text with indefinite length is converted into the vector with fixed length by using a text vector representation method, so that the loss and redundancy of text information are avoided; from the perspective of training data, weighted random sampling is adopted, and the composition of training samples is optimized through the possibility that weight adjusting samples are selected; from the aspect of feature extraction, the text optimizes the feature extraction process by considering not only the context result information but also the text attribute information.
The invention discloses a newspaper and periodical news classification method based on a text context structure and attribute information superposition network, which specifically comprises the following steps:
step 1: acquiring data;
acquiring text information and attribute information of news of a certain newspaper from a database, wherein the text information represents text content of the news, and the attribute information comprises 8 attribute information, specifically: the total number of versions of a certain newspaper and a certain journal on the same day, the number of words of a title of the news, the number of words of a text of the news, the maximum number of words of a news title of a certain newspaper and a certain journal on the same day, the minimum number of words of a news title of a certain newspaper and a certain journal on the same day, the maximum number of words of a text of the news of a certain newspaper and a certain journal on the same day, the minimum number of words of a news title of a certain newspaper and a era number;
step 2: generating a text feature vector;
carrying out vector representation on news text information, converting each news text into a text feature vector with low dimension and high information content respectively, and storing the text feature vector into a database;
and step 3: generating attribute feature vectors;
carrying out vector splicing on the news attribute information, splicing all the attribute information into an attribute feature vector, and finally storing the result into a database;
and 4, step 4: dividing a data set;
dividing the news data in the database into a training set, a verification set and a test set at random according to a certain proportion, wherein the specific proportion is 6: 2: 2;
and 5: sampling;
weighting the first edition news and the non-first edition news in the training set, and obtaining a training sample set with relatively balanced quantities of the first edition news and the non-first edition news by adopting a weighted random sampling mode;
step 6: training a network model;
training a composite neural network by using a text feature vector and an attribute feature vector corresponding to news in a training sample set and a category corresponding to the news;
and 7: predicting;
and inputting the text feature vector and the attribute feature vector of the concentrated news into the composite neural network obtained by training, wherein the output of the network is a prediction result of whether the news is the first edition news.
Compared with the prior art, the invention has the beneficial effects that:
1. the method and the device not only improve the extraction mode of the text features, but also additionally incorporate the attribute features into the feature construction process. In the step 2, a text vector representation method is used for converting the text with the indefinite length into the vector with the fixed length, so that the loss and redundancy of text information are avoided, and the extraction mode of text features is optimized; in step 3, the characteristic information of news is additionally added, so that the source of the characteristics is enriched. The features of the present invention are more efficiently and variously constructed than in the prior art.
2. The present invention uses the technique of weighted random sampling to train the model using the sample set resulting from the sampling. Unlike other related art applications, the problem of news classification of newspapers and periodicals faces the objective fact that the proportion of top news and non-top news is severely unbalanced. In the step 5, a training sample set is obtained by adopting a weighted random sampling method, and the strict control on the form of the training data is realized on the premise of ensuring the authenticity of the data.
3. The invention uses the idea of solving the black box problem by using a black box method, and solves the problem of end-to-end and point-to-point. For the classification problem of news of newspapers and periodicals, under the condition that the existing algorithm and index can not be directly used for directly measuring the importance degree of news, a network model named as composite is provided in step 6 for solving the classification problem end to end. Compared with the prior art, the invention simulates the thinking way of the human brain by using the neural network.
Drawings
Fig. 1 is a flowchart of a news classification method according to the present invention.
Fig. 2 is a schematic structural diagram of a text vector representation method.
Fig. 3 is a schematic diagram illustrating the effect of the weighted random sampling algorithm.
Fig. 4 is a schematic diagram of a composite neural network structure.
FIG. 5 is a classification result of the present invention when handling the "daily news report for people" news classification problem.
Detailed description of the preferred embodiments
For the purpose of making the present invention clearer, the present invention will be described in further detail below with reference to the accompanying drawings.
Fig. 1 visually represents the steps of the news classification method proposed by the present invention. Specifically, the method comprises the steps of data acquisition, text feature vector generation, attribute feature vector generation, data set division, weighted random sampling, composite model training and final classification prediction.
Fig. 2 visually shows the method for converting text into vectors according to the present invention, and the principle is as follows:
training word vectors and text vectors simultaneously; let text diThe corresponding code vector is piThe coding vector corresponding to the word t in the text is wt(ii) a Can construct words t in the text diThe vector at the j-th occurrence of (a) is of the form:
t in the formula is the unilateral context word number considered by the algorithm; setting words t in text diS times, then use as followsTo represent the vector of the word t:
n represents a text diTotal number of words t in the text diSum vector ofSubstituting into the neural network model of the text vector guarantee method, the following outputs can be obtained:
in the above formula, W is a hidden layer in the neural network model, and b is an offset, so the following loss function is constructed:
the distance in the formula is a distance function between vectors, can be a second-order Euclidean distance, and W can be obtained by optimizing the loss functionbestMatrix sum bbestBiasing; in the text diCorresponding vector piAs input, it can be characterized by its low-dimensional vectorIn the form:
fig. 3 illustrates the sampling effect using a weighted random sampling method, as follows:
for class CjIn words, is provided withThe samples belong to class CjThen thisSample d of any one of the samplesiWeight of (1)iCan be expressed in the following form:
c in the formula represents a defined classification category set, and a weighted random sampling method is used, wherein Weight is set from Weight of samples to { Weight ═ Weight }1,weight2,…,weightnD ═ D from sample set1,d2,…,dnThe way of selecting m samples in (1) is as follows:
phi is the element Weight in the set WeightiE.g. Weight, selecting uniformly distributed random numbers u between 0 and 1iAnd calculating k using the following formulai:
Let set K ═ K-iWhere i ═ 1,2, …, n; pressing the set K by KiSorting, selecting maximum m elements to form a Sample set, where the Sample can be expressed as the following formula:
Sample={dl}
Where l meets that kl≥km-th
in the above formula km-thRepresenting the value of the mth largest element in the set K.
Fig. 4 shows the structure of a composite neural network, for which the calculation process gives the following analysis:
the input to the composite neural network is sample news textText feature vector ofAnd attribute feature vectorThe model also characterizes the vector separately for the text using 2 different partsPerforming dimension reduction operation and attribute feature vector pairCarrying out normalization operation;
further, the composite neural network includes: implementation ofVector dimension reduction part and implementationPartial, classified, fully connected networks with vector normalization; implementation ofPartial sum implementation of vector dimension reductionThe outputs of the vector normalization parts are jointly input into a classification full-connection network to realize final classification;
for implementation in a composite neural network modelFor the vector dimension reduction part, 3 layers of fully connected neural networks are provided; in a first layer of fully-connected neural networks, the input is a text characterization vectorThe weight matrix is W1Offset is b1The activation function is ReLu (X), and the output is expressed as follows:
the second and third layers of fully-connected network are similar to the first layer of neural network, and the input of the second layer of fully-connected network is the output H of the first layer of fully-connected network(1)The weight matrix is W2Offset is b2The activation function is ReLu (X); in the third layer of fully-connected neural network, outputOutput H of fully connected network of second layer(2)The weight matrix is W3Offset is b3The activation function is ReLu (X); h(2)And H(3)In the form:
for implementation in a composite neural network modelFor the vector normalization part, a set of attribute vectors corresponding to Sample set Sample is setSetting attribute vectorsHas the dimension ofThen sample SiCorresponding normalized attribute vectorThe value of the j-th item of (1)Can be expressed in the following form:
is obtained byAndthen, the idea based on superposition willAndsplicing to realize feature fusion, and recording the superposed result asExpressed in the following form:
for a classified fully-connected network, the input is a hybrid vectorThe weight matrix is W4Offset coefficient of b4The activation function is Softmax (X), then the outputExpressed in the following form:
output vectorIs a one-dimensional 2-element vector, i.e.The value of the first column of the vector represents news SiThe value in the second column represents news S as a probability of top newsiProbability of non-headline news.
To demonstrate the effectiveness of this patent in solving the news classification problem, the "daily news" news is used herein as an example. In the process of processing data, the original data are divided into 4 times according to the administration time to form corresponding 4 sub-data sets, wherein the original data are distinguished from different styles of different leader core administration periods.
Verification experiments are carried out on the 4 news subdata sets, and XGBoost, Random Forest and SVM classification methods based on Doc2Vec vectors, a fastText text classification method based on deep learning, a TextCNN text classification method, a TextRNN text classification method and a TextRCNN text classification method are evaluated respectively, and the performances of a newspaper and periodical news classification method based on a text context structure and attribute information overlay network on the head edition news classification problem of the Renminbi newspaper are provided.
FIG. 5 illustrates the accuracy, correctness, recall, and F of the classification of 4 stages of the "daily Rens for people" news data using the various text classification methods mentioned above1The value is obtained.
From the sub-graph a in fig. 5, we can see that the news classification method proposed by this patent is considerably improved compared with other classification methods. Specifically, in the first sub data set Stage1, compared with the Doc2Vec vector-based XGBoost, Random Forest, SVM classification method, fastText classification method, TextCNN text classification method, TextRNN text classification method, and TextRCNN text classification method, the news classification method provided by the present patent is respectively improved by 10.31%, 3.01%, 14.32%, 32.55%, 21.32%, 23.93%, and 22.10%. In the second sub data set Stage2, compared with the XGBoost, SVM classification method, fastText classification method, TextCNN text classification method, TextRNN text classification method, and TextRCNN text classification method based on Doc2Vec vector, the news classification method proposed by the present patent is respectively improved by 4.88%, 11.00%, 16.38%, 5.57%, 5.71%, and 0.18%. In the third sub data set Stage3, compared with the XGBoost, Random Forest, SVM classification method, fastText classification method, TextCNN text classification method, TextRNN text classification method, TextRCNN text classification method based on Doc2Vec vector, the news classification method provided by the patent is respectively improved by 9.72%, 0.15%, 13.48%, 17.33%, 17.01%, 17.67%, and 18.73%. In the fourth sub data set Stage4, compared with the XGBoost, Random Forest, SVM classification method, fastText classification method, TextCNN text classification method, TextRNN text classification method, TextRCNN text classification method based on Doc2Vec vector, the news classification method provided by the patent is respectively 5.71%, 1.09%, 14.10%, 3.47%, 3.28%, 5.62%, 1.30%.
Because the proportion of the first edition news and the non-first edition news of the people's daily newspaper is not balanced, the accuracy rate of only considering the classification result is far from sufficient. In FIG. 5, sub-graphs b, c, d are derived from the accuracy, recall and F of the predicted results1The value characterizes the actual performance of each text classification method in processing the classification problem of the first edition news of the ' people's daily newspaper '.
From the sub-graphs b and c in fig. 5, we can see that the news classification method proposed by the present patent has relatively higher accuracy and recall rate on each sub-data set or no order of magnitude difference from the best result. Specifically, in terms of accuracy, the method is better than the XGBoost, SVM classification method, fastText classification method, TextCNN text classification method, TextRNN text classification method, TextRCNN text classification method based on Doc2Vec vector in all stages, and is better than the Random Forest classification method based on Doc2Vec vector in some specific stages (for example, Stage 3). In terms of recall, the method of Random Forest classification based on Doc2Vec vector is superior at all stages, and the method of TextRCNN text classification is superior at some specific stages (e.g., Stage 2).
From fig. 5, sub-graph d, we can see that the news classification method proposed by this patent is in F of classification result compared with other algorithms1There is a considerable improvement in the value. Specifically, in the first sub data set Stage1, compared with the Doc2Vec vector-based XGBoost, Random Forest, SVM classification method, fastText classification method, TextCNN text classification method, TextRNN text classification method, and TextRCNN text classification method, the news classification method provided by the present patent is respectively improved by 3.21%, 42.53%, 4.06, 25.25%, 23.45%, 22.25%, and 20.43%. In a second subset Stage2, this patentCompared with the XGboost, Random Forest, SVM, fastText text, TextCNN, TextRNN and TextRCNN text classification methods based on the Doc2Vec vector, the provided news classification method is respectively improved by 2.70%, 42.99, 3.93%, 8.88%, 5.53%, 3.73% and 3.33%. In the third sub data set Stage3, compared with the XGBoost, Random Forest, SVM classification method, fastText classification method, TextCNN text classification method, TextRNN text classification method, TextRCNN text classification method based on Doc2Vec vector, the news classification method provided by the patent is respectively improved by 7.76%, 46.93%, 8.43%, 12.80%, 12.27%, 12.49%, 12.54%. In the fourth sub data set Stage4, compared with the XGBoost, Random Forest, SVM classification method, fastText classification method, TextCNN text classification method and TextRNN text classification method based on Doc2Vec vector, the news classification method provided by the present patent is respectively improved by 9.38%, 46.85%, 14.04%, 1.28%, 0.69% and 3.76%.
Claims (3)
1. A news classification method of newspapers and periodicals based on a text context structure and attribute information superposition network specifically comprises the following steps:
step 1: acquiring data;
acquiring text information and attribute information of news of a certain newspaper from a database, wherein the text information represents text content of the news, and the attribute information comprises 8 attribute information, specifically: the total number of versions of a certain newspaper and a certain journal on the same day, the number of words of a title of the news, the number of words of a text of the news, the maximum number of words of a news title of a certain newspaper and a certain journal on the same day, the minimum number of words of a news title of a certain newspaper and a certain journal on the same day, the maximum number of words of a text of the news of a certain newspaper and a certain journal on the same day, the minimum number of words of a news title of a certain newspaper and a era number;
step 2: generating a text feature vector;
carrying out vector representation on news text information, converting each news text into a text feature vector with low dimension and high information content respectively, and storing the text feature vector into a database;
and step 3: generating attribute feature vectors;
carrying out vector splicing on the news attribute information, splicing all the attribute information into an attribute feature vector, and finally storing the result into a database;
and 4, step 4: dividing a data set;
dividing news data in a database into a training set, a verification set and a test set at random;
and 5: sampling;
weighting the first edition news and the non-first edition news in the training set, and obtaining a training sample set by adopting a weighted random sampling mode;
for class CjIn words, is provided withThe samples belong to class CjThen thisSample d of any one of the samplesiWeight of (1)iCan be expressed in the following form:
c in the formula represents a defined classification category set, and a weighted random sampling method is used, wherein Weight is set from Weight of samples to { Weight ═ Weight }1,weight2,…,weightnD ═ D from sample set1,d2,…,dnThe way of selecting m samples in (1) is as follows:
phi is the element Weight in the set WeightiE.g. Weight, selecting uniformly distributed random numbers u between 0 and 1iAnd calculating k using the following formulai:
Let set K ═ K-iWhere i ═ 1,2, …, n; general collectionAnd then K is pressediSorting, selecting maximum m elements to form a Sample set, where the Sample can be expressed as the following formula:
Sample={dl}
Where l meets that kl≥km-th
in the above formula km-thA value representing the mth largest element in the set K;
step 6: training a network model;
training a composite neural network by using a text feature vector and an attribute feature vector corresponding to news in a training sample set and a category corresponding to the news;
and 7: predicting;
and inputting the text feature vector and the attribute feature vector of the concentrated news into the composite neural network obtained by training, wherein the output of the network is a prediction result of whether the news is the first edition news.
2. The method for classifying news of newspapers and periodicals based on the text context structure and the attribute information overlay network as claimed in claim 1, wherein the specific method for generating the text feature vector in the step 2 is as follows:
training word vectors and text vectors simultaneously; let text diThe corresponding code vector is piThe coding vector corresponding to the word t in the text is wt(ii) a Can construct words t in the text diThe vector at the j-th occurrence of (a) is of the form:
t in the formula is the unilateral context word number considered by the algorithm; setting words t in text diS times, then use as followsTo represent the vector of the word t:
n represents a text diTotal number of words t in the text diSum vector ofSubstituting into the neural network model of the text vector guarantee method, the following outputs can be obtained:
in the above formula, W is a hidden layer in the neural network model, and b is an offset, so the following loss function is constructed:
the distance in the formula is a distance function between vectors, can be a second-order Euclidean distance, and W can be obtained by optimizing the loss functionbestMatrix sum bbestBiasing; in the text diCorresponding vector piAs input, it can be characterized by its low-dimensional vectorIn the form:
3. the method for classifying news of newspapers and periodicals based on the text context structure and the attribute information overlay network as claimed in claim 1, wherein the specific method of the step 6 is as follows:
the composite neural network includes: implementation ofVector dimension reduction part and implementationPartial, classified, fully connected networks with vector normalization; implementation ofPartial sum implementation of vector dimension reductionThe outputs of the vector normalization parts are jointly input into a classification full-connection network to realize final classification;
for implementation in a composite neural network modelFor the vector dimension reduction part, 3 layers of fully connected neural networks are provided; in a first layer of fully-connected neural networks, the input is a text characterization vectorThe weight matrix is W1Offset is b1The activation function is ReLu (X), and the output is expressed as follows:
the second and third layers of fully-connected network are similar to the first layer of neural network, and the input of the second layer of fully-connected network is the output H of the first layer of fully-connected network(1)The weight matrix is W2Offset is b2The activation function is ReLu (X); in the third layer of fully-connected neural network, the input is the output H of the second layer of fully-connected network(2)The weight matrix is W3Offset is b3The activation function is ReLu (X); h(2)And H(3)In the form:
for implementation in a composite neural network modelFor the vector normalization part, a set of attribute vectors corresponding to Sample set Sample is setSetting attribute vectorsHas the dimension ofThen sample SiCorresponding normalized attribute vectorThe value of the j-th item of (1)Can be expressed in the following form:
is obtained byAndthen, the idea based on superposition willAndsplicing to realize feature fusion, and recording the superposed result as Expressed in the following form:
for a classified fully-connected network, the input is a hybrid vectorThe weight matrix is W4Offset coefficient of b4The activation function is Softmax (X), then the outputExpressed in the following form:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010729459.0A CN111966828B (en) | 2020-07-27 | 2020-07-27 | Newspaper and magazine news classification method based on text context structure and attribute information superposition network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010729459.0A CN111966828B (en) | 2020-07-27 | 2020-07-27 | Newspaper and magazine news classification method based on text context structure and attribute information superposition network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111966828A true CN111966828A (en) | 2020-11-20 |
CN111966828B CN111966828B (en) | 2022-05-03 |
Family
ID=73364052
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010729459.0A Active CN111966828B (en) | 2020-07-27 | 2020-07-27 | Newspaper and magazine news classification method based on text context structure and attribute information superposition network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111966828B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11538210B1 (en) * | 2021-11-22 | 2022-12-27 | Adobe Inc. | Text importance spatial layout |
CN117931881A (en) * | 2024-03-15 | 2024-04-26 | 四川鑫正工程项目管理咨询有限公司 | Engineering cost query management method |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109918501A (en) * | 2019-01-18 | 2019-06-21 | 平安科技(深圳)有限公司 | Method, apparatus, equipment and the storage medium of news article classification |
WO2020092834A1 (en) * | 2018-11-02 | 2020-05-07 | Valve Corporation | Classification and moderation of text |
CN111125354A (en) * | 2018-10-31 | 2020-05-08 | 北京国双科技有限公司 | Text classification method and device |
-
2020
- 2020-07-27 CN CN202010729459.0A patent/CN111966828B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111125354A (en) * | 2018-10-31 | 2020-05-08 | 北京国双科技有限公司 | Text classification method and device |
WO2020092834A1 (en) * | 2018-11-02 | 2020-05-07 | Valve Corporation | Classification and moderation of text |
CN109918501A (en) * | 2019-01-18 | 2019-06-21 | 平安科技(深圳)有限公司 | Method, apparatus, equipment and the storage medium of news article classification |
Non-Patent Citations (1)
Title |
---|
张谦等: "基于Word2vec 的微博短文本分类研究", 《技术研究》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11538210B1 (en) * | 2021-11-22 | 2022-12-27 | Adobe Inc. | Text importance spatial layout |
CN117931881A (en) * | 2024-03-15 | 2024-04-26 | 四川鑫正工程项目管理咨询有限公司 | Engineering cost query management method |
CN117931881B (en) * | 2024-03-15 | 2024-05-24 | 四川鑫正工程项目管理咨询有限公司 | Engineering cost query management method |
Also Published As
Publication number | Publication date |
---|---|
CN111966828B (en) | 2022-05-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108363753B (en) | Comment text emotion classification model training and emotion classification method, device and equipment | |
CN105631468B (en) | A kind of picture based on RNN describes automatic generation method | |
CN111444340A (en) | Text classification and recommendation method, device, equipment and storage medium | |
CN107729309A (en) | A kind of method and device of the Chinese semantic analysis based on deep learning | |
CN113569001A (en) | Text processing method and device, computer equipment and computer readable storage medium | |
CN113987187B (en) | Public opinion text classification method, system, terminal and medium based on multi-label embedding | |
CN112528163B (en) | Social platform user occupation prediction method based on graph convolution network | |
CN110807324A (en) | Video entity identification method based on IDCNN-crf and knowledge graph | |
CN111966828B (en) | Newspaper and magazine news classification method based on text context structure and attribute information superposition network | |
CN111858940A (en) | Multi-head attention-based legal case similarity calculation method and system | |
CN111931061A (en) | Label mapping method and device, computer equipment and storage medium | |
CN113688635B (en) | Class case recommendation method based on semantic similarity | |
CN112800225B (en) | Microblog comment emotion classification method and system | |
CN113220768A (en) | Resume information structuring method and system based on deep learning | |
CN112256866A (en) | Text fine-grained emotion analysis method based on deep learning | |
CN113946677A (en) | Event identification and classification method based on bidirectional cyclic neural network and attention mechanism | |
CN114841151B (en) | Medical text entity relation joint extraction method based on decomposition-recombination strategy | |
CN114780723B (en) | Portrayal generation method, system and medium based on guide network text classification | |
CN115048511A (en) | Bert-based passport layout analysis method | |
CN115392254A (en) | Interpretable cognitive prediction and discrimination method and system based on target task | |
CN113934835B (en) | Retrieval type reply dialogue method and system combining keywords and semantic understanding representation | |
CN112950414B (en) | Legal text representation method based on decoupling legal elements | |
WO2022087688A1 (en) | System and method for text mining | |
CN113627550A (en) | Image-text emotion analysis method based on multi-mode fusion | |
CN115023710B (en) | Transferable neural architecture for structured data extraction from web documents |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |