CN112966103B - Mixed attention mechanism text title matching method based on multi-task learning - Google Patents
Mixed attention mechanism text title matching method based on multi-task learning Download PDFInfo
- Publication number
- CN112966103B CN112966103B CN202110190612.1A CN202110190612A CN112966103B CN 112966103 B CN112966103 B CN 112966103B CN 202110190612 A CN202110190612 A CN 202110190612A CN 112966103 B CN112966103 B CN 112966103B
- Authority
- CN
- China
- Prior art keywords
- text
- matrix
- data
- title
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a mixed attention strategy text title matching method based on multi-task learning, which is characterized in that the multi-task learning of a model is realized by simultaneously carrying out a classification task 1 of the original type of a text and a classification task 2 of whether the text is a 'headline party' article on an input text, the model is jointly trained through a multi-task learning model, and one task assists the other task to learn better parameters. According to the method, the model parameters are adjusted by using the back propagation of the classification task 1, so that the classification task 2 can obtain better performance. The attention mechanism provided by the method can calculate the association degree of each element and other elements in one step, and has small calculation amount and high efficiency.
Description
Technical Field
The invention relates to the field of text processing, in particular to a text title matching method based on a mixed attention mechanism of multi-task learning.
Background
In the network era, the network text 'heading party' is still in the trend based on actual benefits brought by traffic hoarding and traffic incentive, so that the browsing experience of network users is reduced. The platform with active "title party" is the condition that users run off, and the sustainable development of the platform is affected.
The title party is a general name of website editors, reporters, managers and netizens for various purposes such as increasing click rate or known name by making an eye-catching title on a forum or media represented by the internet to attract the attention of audiences and making click-in discovery have a large and reasonable gap with the title. The "title party" main behavior is simply that the title of the posting is strongly exaggerated, and the content of the post is usually completely unrelated or not strongly related to the title.
The network user can report after reading the junk text without nutritive value, and a platform supervisor is expected to put the article off shelf. The web text publishing platform usually performs a certain degree of text detection during text publishing, but the detection mechanism is loose. In order to improve the user stickiness and the experience sense during text sending and avoid the user dislike and user loss caused by frequent text sending failure, the platform side only strictly checks whether the text contains sensitive words at present, and the text can not be successfully published if some words can be found in the text by matching the text with all words in the sensitive word dictionary.
Background auditing is also a means for improving the text quality of the platform on the platform side. The platform randomly samples a certain amount of successfully published texts for background manual review. Some texts with higher report quantity can be included in the scope of background review. But again based on user stickiness and user experience considerations, the backend personnel will not easily delete the user's text unless the article has a particularly obvious place of noncompliance. From the present, the 'heading party' is mainly based on manual detection and lacks systematic algorithm and model support.
And (5) manual detection. And deleting the network user or the platform auditor after the network user or the platform auditor is found manually. Because the subjective consciousness, reading habit and character interest are different, the manual judgment standard is not uniform. The user's habits of using the software are different and the report is not necessarily made. Based on massive network texts, a reporting mechanism of the platform sets a reporting quantity threshold value, and only if the reporting quantity of the articles exceeds the threshold value, the article reporting quantity is automatically transferred to a background for auditing.
Simple and loose machine detection method. And matching through words prestored in the dictionary, and if the words are matched, rejecting the user to issue. A loose detection mechanism cannot effectively detect the "title party" text. Most of texts of the 'heading party' are in word compliance, the number of words contained in a sensitive word dictionary is limited, and the method cannot be applied to the application scene in practical application.
In view of the above situation, an effective "headline party" text detection method is needed to filter articles in the "headline party" in the network, improve the quality of internet texts, improve the experience of network users, and enable various text push platforms to be continuously and healthily developed.
Disclosure of Invention
Aiming at the defects of the prior art, the method for matching the text titles of the mixed attention mechanism based on the multitask learning comprises the following steps:
step 1: crawling different types of 'title party' text data and normal text data to form a data set;
step 2: cleaning the data set, and removing interference characters of webpage labels and network emoticons;
and step 3: respectively marking the titles and the texts of the text data in the data set into categories to generate classified data, wherein the category marks comprise a classification task 1 and a classification task 2, the classification task 1 is an original category when the titles are marked as crawling data, and the classification task 2 is used for marking the texts as whether the texts are 'headline party' texts;
and 4, step 4: respectively performing word segmentation processing on the title and the text of the classification data obtained in the step 3 to obtain a text word sequence;
and 5: processing the text word sequence into a preset fixed length, wherein the length is not filled with 0, and the length is truncated;
step 6: randomly disordering the text data marked with the categories to fully mix the text of the 'title party' with the normal text;
and 7: dividing the mixed data set into batch data with the size of batch;
and 8: inputting the batch of data into a constructed text detection model for training, and specifically comprising the following steps:
step 8.1: inputting the title and the text of the batch of data into the same BERT model, and respectively obtaining word embedding matrixes T ═ T of the text and the title1,t2,t3……tn}、C={c1,c2,c3……cm},T∈Rn×300N is the word sequence length of the text, C is the element Rm×300M is the word sequence length of the title, 300 is the word vector dimension encoded by the standard BERT model, while obtaining the first output of the BERT modelAnd
step 8.2: randomly initializing a shared parameter matrix W ∈ R300×nPerforming matrix transformation to obtain a feature matrix M epsilon R mixing text and title informationm×300The mathematical expression of the matrix transformation is as follows:
Mm×300=Cm×300×W300×n×Tn×300
step 8.3: to pairAndperforming matrix transformation to obtain a characteristic matrix F e R300×300The matrix transform mathematical expression is as follows:
step 8.4: taking M as Q and V, taking F as K, calculating a mixed attention matrix A epsilon Rm×300The calculation method is as follows:
wherein d iskA second dimension of K;
step 8.5: fully connecting the mixed attention matrix A to obtain a dimension reduction matrix D epsilon Rm×n;
Step 8.6: randomly initializing a first weight matrix W1Fully connecting the dimensionality reduction matrix D to obtain a one-dimensional matrix, calculating softmax as the output of the classification task 1 in the step 3, wherein the dimensionality is R1×jJ is the original category number of the data;
step 8.7: randomly initializing a second weight matrix W2Fully connecting the dimensionality reduction matrix D to obtain a one-dimensional matrix, calculating softmax as the output of the classification task 2 mentioned in the step 3, wherein the dimensionality is R1×2Two dimensions respectively represent the probability of being or not the "heading party";
step 8.8: taking the maximum value in the results of step 8.6 and step 8.7 as p of the corresponding taskiRespectively calculating cross entropy, summing and averaging, wherein the mathematical expression is as follows:
wherein n is the size of batch, yiIs a real label of the ith piece of data, piCalculating the maximum probability of the label to which the ith piece of data belongs for the model;
step 8.9: the result of the step 8.8 is used as an error to carry out back propagation and is used for training model parameters;
step 8.10: and setting an end condition, if the end condition is not met, repeating the step 8.1 to the step 8.9 until the end condition is met, and stopping the training of the model.
According to a preferred embodiment, the method further includes testing the trained text detection model, specifically including:
and step 9: and (3) executing the step 1 to the step 8.7 aiming at the trained model, taking the subscript of the maximum number in the output result of the task two in the step 8.7 as a final result, and not executing the step 8.8 to the step 8.10 any more.
The invention has the beneficial effects that:
1. the invention provides a mixed attention strategy based on multi-task learning, which is used for extracting key information from a text to match with a title so as to realize the detection of a 'title party' article and obviously improving the detection precision and accuracy of the title party.
2. As the RNN model forgets early input information along with the increase of the input sequence, the calculation amount is large, and the RNN is a time sequence model and cannot be calculated in parallel, the attention mechanism provided by the method can calculate the association degree of each element and other elements in one step, and is small in calculation amount and high in efficiency.
3. Meanwhile, the similarity of the text calculated by the text embedded vector can be interfered by a large amount of noise, the text detection method can avoid the influence of the text noise, is suitable for the variability of the naming strategy of the 'headline party' article, breaks through the limitation of the traditional similarity calculation method, realizes high-efficiency calculation and accurate classification, and reduces a large amount of manual operation.
Drawings
FIG. 1 is a flow chart of a method according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings in conjunction with the following detailed description. It should be understood that the description is intended to be exemplary only, and is not intended to limit the scope of the present invention. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present invention.
The following detailed description is made with reference to the accompanying drawings.
For a long text body and a short title, only from a character level analysis, most words of the text body are irrelevant to the title, and the similarity of the text calculated by the text embedding vector is interfered by a large amount of noise. The scheme provides a mixed attention strategy based on multi-task learning, which is used for extracting key information from a text and matching the key information with a title so as to realize the detection of a 'title party' article. Compared with the RNN-based model, the memory cells in the model forget the information inputted earlier due to the increase of the RNN model with the input sequence, and the calculation amount is larger. Meanwhile, the RNN is a time sequence model and cannot be calculated in parallel. The attention mechanism used by the scheme can calculate the association degree of each element and other elements in one step, and has small calculation amount and high efficiency.
The basic idea of the scheme is to design a technical scheme of matching degree of text and title to realize the identification of the 'title party' article.
The BERT used in the present invention is an auto-encoder, each of which outputs a context that encodes text, i.e., each element of T and W already contains context information. The first output of the BERT encodes full-text information, i.e.Andfull-text information of the title and full-text information of the body are encoded separately. Similar effects can be achieved by the RNN-based bidirectional timing model, but the RNN-based model requires a word embedding matrix to be obtained in one step, and the BERT can automatically encode by directly inputting a text sequence.
The mixed attention strategy calculates two different feature matrixes which are Q, V and K respectively through an attention mechanism mixed word embedding matrix of text and a word embedding matrix of a title, and a mixed attention matrix is calculated.
The attention mechanism can flexibly capture global and local relations by comparing each element with other elements, and is a one-step in-place process, namely the attention mechanism can pick up the emphasis in a long-sequence text. Thus, the hybrid attention strategy can focus attention on textual focus content matching the title through training.
The multi-task learning of the model is embodied in that the model simultaneously performs classification learning (classification task 1) of original text categories and classification learning (classification task 2) of whether the text is a 'headline party' article on the input text. The model is jointly trained through a multi-task learning model, and one task assists the other task to learn better parameters. The scheme adjusts the model parameters by using the back propagation of the classification task 1, so that the classification task 2 obtains better performance. The following detailed description of the embodiments refers to the accompanying drawings.
Step 1: crawling different categories of 'heading party' text data and normal text data to form a data set.
Step 2: and cleaning the data set, and removing the interference characters of the webpage labels and the network emoticons.
And step 3: respectively carrying out category marking on the title and the body of the text data in the data set to generate classified data, wherein the category marking comprises a classification task 1 and a classification task 2, the classification task 1 is an original category when the title is marked as crawling data, and the classification task 2 is used for marking the body as whether the text is a 'heading party' text.
And 4, step 4: and (4) performing word segmentation processing on the titles and the texts of the classification data obtained in the step (3) respectively to obtain a text word sequence.
And 5: and processing the text word sequence into a preset fixed length, wherein the length is not filled with 0, and the length exceeds the truncation.
Step 6: and randomly disordering the text data marked with the categories to fully mix the text of the 'title party' with the normal text.
And 7: the blended dataset is divided into batch data of batch size.
And 8: inputting the batch data into a constructed text detection model for training, and specifically comprising the following steps:
step 8.1: inputting the title and the text of the batch data into the same BERT model, and respectively obtaining word embedding matrixes T ═ T of the text and the title1,t2,t3……tn}、C={c1,c2,c3……cm}。T∈Rn×300N is the word sequence length of the text, C is the element Rm×300M is the word sequence length of the title, 300 is the word vector dimension encoded by the standard BERT model, while obtaining the first output of the BERT modelAnd by default the title global information is encoded,the body global information is encoded by default.
Step 8.2: randomly initializing a shared parameter matrix W ∈ R300×nPerforming matrix transformation to obtain a feature matrix M epsilon R mixing text and title informationm×300The mathematical expression of the matrix transformation is as follows:
Mm×300=Cm×300×W300×n×Tn×300
step 8.3: to pairAndperforming matrix transformation to obtain a characteristic matrix F e R300×300Change of matrixThe mathematical expressions are changed as follows:
step 8.4: taking M as Q and V, taking F as K, calculating a mixed attention matrix A epsilon Rm×300The calculation method is as follows:
wherein d iskA second dimension of K; the Attention mechanism consists of three parts, query (Q), value (V), Key (K).
Step 8.5: fully connecting the mixed attention matrix A to obtain a dimension reduction matrix D belonging to Rm×n。
Step 8.6: randomly initializing a first weight matrix W1Fully connecting the dimensionality reduction matrix D to obtain a one-dimensional matrix, calculating softmax as the output of the classification task 1 in the step 3, wherein the dimensionality is R1×jAnd j is the original category number of the data.
Step 8.7: randomly initializing a second weight matrix W2Fully connecting the dimensionality reduction matrix D to obtain a one-dimensional matrix, calculating softmax as the output of the classification task 2 mentioned in the step 3, wherein the dimensionality is R1×2The two dimensions represent the probability of being or not the "heading party", respectively.
Step 8.8: taking the maximum value in the results of step 8.6 and step 8.7 as p of the corresponding taskiRespectively calculating cross entropy, summing and averaging, wherein the mathematical expression is as follows:
wherein n is the size of batch, yiIs a real label of the ith piece of data, piAnd calculating the maximum probability of the label to which the ith piece of data belongs for the model.
Step 8.9: and (5) performing back propagation on the result of the step 8.8 as an error for model parameter training.
Step 8.10: and setting an end condition, if the end condition is not reached, if the result in 1000 rounds is not promoted, repeating the steps from 8.1 to 8.9 until the end condition is met, and stopping the training of the model.
The method of the invention also comprises the step of testing the trained text detection model, which specifically comprises the following steps:
and step 9: and (3) executing the step 1 to the step 8.7 aiming at the trained model, taking the subscript of the maximum number in the output result of the task two in the step 8.7 as a final result, and not executing the step 8.8 to the step 8.10 any more.
It should be noted that the above-mentioned embodiments are exemplary, and that those skilled in the art, having benefit of the present disclosure, may devise various arrangements that are within the scope of the present disclosure and that fall within the scope of the invention. It should be understood by those skilled in the art that the present specification and figures are illustrative only and are not limiting upon the claims. The scope of the invention is defined by the claims and their equivalents.
Claims (2)
1. A text title matching method based on a mixed attention mechanism of multitask learning is characterized in that,
step 1: crawling different types of 'title party' text data and normal text data to form a data set;
step 2: cleaning the data set, and removing interference characters of webpage labels and network emoticons;
and step 3: respectively marking the titles and the texts of the text data in the data set into categories to generate classified data, wherein the category marks comprise a classification task 1 and a classification task 2, the classification task 1 is an original category when the titles are marked as crawling data, and the classification task 2 is used for marking the texts as whether the texts are 'headline party' texts;
and 4, step 4: respectively performing word segmentation processing on the title and the text of the classification data obtained in the step 3 to obtain a text word sequence;
and 5: processing the text word sequence into a preset fixed length, wherein the length is not filled with 0, and the length is truncated;
step 6: randomly disordering the text data marked with the categories to fully mix the text of the 'title party' with the normal text;
and 7: dividing the mixed data set into batch data with the size of batch;
and 8: inputting the batch of data into a constructed text detection model for training, and specifically comprising the following steps:
step 8.1: inputting the title and the text of the batch of data into the same BERT model, and respectively obtaining word embedding matrixes T ═ T of the text and the title1,t2,t3……tn}、C={c1,c2,c3……cm},T∈Rn×300N is the word sequence length of the text, C is the element Rm×300M is the word sequence length of the title, 300 is the word vector dimension encoded by the standard BERT model, while obtaining the first output of the BERT modelAnd
step 8.2: randomly initializing a shared parameter matrix W ∈ R300×nPerforming matrix transformation to obtain a feature matrix M epsilon R mixing text and title informationm×300The mathematical expression of the matrix transformation is as follows:
Mm×300=Cm×300×W300×n×Tn×300
step 8.3: to pairAndis subjected to matrix transformation to obtainTo the feature matrix F ∈ R300×300The matrix transform mathematical expression is as follows:
step 8.4: taking M as Q and V, taking F as K, calculating a mixed attention matrix A epsilon Rm×300The calculation method is as follows:
wherein d iskA second dimension of K;
step 8.5: fully connecting the mixed attention matrix A to obtain a dimension reduction matrix D epsilon Rm×n;
Step 8.6: randomly initializing a first weight matrix W1Fully connecting the dimensionality reduction matrix D to obtain a one-dimensional matrix, calculating softmax as the output of the classification task 1 in the step 3, wherein the dimensionality is R1×jJ is the original category number of the data;
step 8.7: randomly initializing a second weight matrix W2Fully connecting the dimensionality reduction matrix D to obtain a one-dimensional matrix, calculating softmax as the output of the classification task 2 mentioned in the step 3, wherein the dimensionality is R1×2Two dimensions respectively represent the probability of being or not the "heading party";
step 8.8: taking the maximum value in the results of step 8.6 and step 8.7 as p of the corresponding taskiRespectively calculating cross entropy, summing and averaging, wherein the mathematical expression is as follows:
wherein n is the size of batch, yiIs a real label of the ith piece of data, piCalculating the maximum probability of the label to which the ith piece of data belongs for the model;
step 8.9: the result of the step 8.8 is used as an error to carry out back propagation and is used for training model parameters;
step 8.10: and setting an end condition, if the end condition is not met, repeating the step 8.1 to the step 8.9 until the end condition is met, and stopping the training of the model.
2. The method for matching text titles according to claim 1, wherein the method further comprises testing the trained text detection model, specifically comprising:
and step 9: and (3) executing the step 1 to the step 8.7 aiming at the trained model, taking the subscript of the maximum number in the output result of the task two in the step 8.7 as a final result, and not executing the step 8.8 to the step 8.10 any more.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110190612.1A CN112966103B (en) | 2021-02-05 | 2021-02-05 | Mixed attention mechanism text title matching method based on multi-task learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110190612.1A CN112966103B (en) | 2021-02-05 | 2021-02-05 | Mixed attention mechanism text title matching method based on multi-task learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112966103A CN112966103A (en) | 2021-06-15 |
CN112966103B true CN112966103B (en) | 2022-04-19 |
Family
ID=76285176
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110190612.1A Active CN112966103B (en) | 2021-02-05 | 2021-02-05 | Mixed attention mechanism text title matching method based on multi-task learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112966103B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113688621B (en) * | 2021-09-01 | 2023-04-07 | 四川大学 | Text matching method and device for texts with different lengths under different granularities |
CN115357720B (en) * | 2022-10-20 | 2023-05-26 | 暨南大学 | BERT-based multitasking news classification method and device |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107491436A (en) * | 2017-08-21 | 2017-12-19 | 北京百度网讯科技有限公司 | A kind of recognition methods of title party and device, server, storage medium |
CN108304379A (en) * | 2018-01-15 | 2018-07-20 | 腾讯科技(深圳)有限公司 | A kind of article recognition methods, device and storage medium |
CN108429920A (en) * | 2018-02-06 | 2018-08-21 | 北京奇虎科技有限公司 | A kind of method and apparatus of processing title party video |
CN108491389A (en) * | 2018-03-23 | 2018-09-04 | 杭州朗和科技有限公司 | Click bait title language material identification model training method and device |
CN109614614A (en) * | 2018-12-03 | 2019-04-12 | 焦点科技股份有限公司 | A kind of BILSTM-CRF name of product recognition methods based on from attention |
CN109657055A (en) * | 2018-11-09 | 2019-04-19 | 中山大学 | Title party article detection method and federal learning strategy based on level hybrid network |
CN109753567A (en) * | 2019-01-31 | 2019-05-14 | 安徽大学 | A kind of file classification method of combination title and text attention mechanism |
CN110210022A (en) * | 2019-05-22 | 2019-09-06 | 北京百度网讯科技有限公司 | Header identification method and device |
CN110598046A (en) * | 2019-09-17 | 2019-12-20 | 腾讯科技(深圳)有限公司 | Artificial intelligence-based identification method and related device for title party |
CN111199155A (en) * | 2018-10-30 | 2020-05-26 | 飞狐信息技术(天津)有限公司 | Text classification method and device |
CN111813932A (en) * | 2020-06-17 | 2020-10-23 | 北京小米松果电子有限公司 | Text data processing method, text data classification device and readable storage medium |
CN112287105A (en) * | 2020-09-30 | 2021-01-29 | 昆明理工大学 | Method for analyzing correlation of law-related news fusing bidirectional mutual attention of title and text |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9881059B2 (en) * | 2014-08-08 | 2018-01-30 | Yahoo Holdings, Inc. | Systems and methods for suggesting headlines |
-
2021
- 2021-02-05 CN CN202110190612.1A patent/CN112966103B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107491436A (en) * | 2017-08-21 | 2017-12-19 | 北京百度网讯科技有限公司 | A kind of recognition methods of title party and device, server, storage medium |
CN108304379A (en) * | 2018-01-15 | 2018-07-20 | 腾讯科技(深圳)有限公司 | A kind of article recognition methods, device and storage medium |
CN108429920A (en) * | 2018-02-06 | 2018-08-21 | 北京奇虎科技有限公司 | A kind of method and apparatus of processing title party video |
CN108491389A (en) * | 2018-03-23 | 2018-09-04 | 杭州朗和科技有限公司 | Click bait title language material identification model training method and device |
CN111199155A (en) * | 2018-10-30 | 2020-05-26 | 飞狐信息技术(天津)有限公司 | Text classification method and device |
CN109657055A (en) * | 2018-11-09 | 2019-04-19 | 中山大学 | Title party article detection method and federal learning strategy based on level hybrid network |
CN109614614A (en) * | 2018-12-03 | 2019-04-12 | 焦点科技股份有限公司 | A kind of BILSTM-CRF name of product recognition methods based on from attention |
CN109753567A (en) * | 2019-01-31 | 2019-05-14 | 安徽大学 | A kind of file classification method of combination title and text attention mechanism |
CN110210022A (en) * | 2019-05-22 | 2019-09-06 | 北京百度网讯科技有限公司 | Header identification method and device |
CN110598046A (en) * | 2019-09-17 | 2019-12-20 | 腾讯科技(深圳)有限公司 | Artificial intelligence-based identification method and related device for title party |
CN111813932A (en) * | 2020-06-17 | 2020-10-23 | 北京小米松果电子有限公司 | Text data processing method, text data classification device and readable storage medium |
CN112287105A (en) * | 2020-09-30 | 2021-01-29 | 昆明理工大学 | Method for analyzing correlation of law-related news fusing bidirectional mutual attention of title and text |
Non-Patent Citations (2)
Title |
---|
MODELING MULTI-TARGETS SENTIMENT CLASSIFICATION VIA GRAPH CONVOLUTIONAL NETWORKS AND AUXILIARY RELATION;FENG AO ET AL.;《COMPUTERS MATERIALS & CONTINUA》;20201231;第909-923页 * |
识别网络新闻标题党;张晓春;《文学教育(上)》;20180205(第02期);第164-165页 * |
Also Published As
Publication number | Publication date |
---|---|
CN112966103A (en) | 2021-06-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111008278B (en) | Content recommendation method and device | |
CN106599022B (en) | User portrait forming method based on user access data | |
CN109933664B (en) | Fine-grained emotion analysis improvement method based on emotion word embedding | |
CN107515873B (en) | Junk information identification method and equipment | |
US8630972B2 (en) | Providing context for web articles | |
CN101520784B (en) | Information issuing system and information issuing method | |
Lenz et al. | Measuring the diffusion of innovations with paragraph vector topic models | |
CN109918621B (en) | News text infringement detection method and device based on digital fingerprints and semantic features | |
CN113590970B (en) | Personalized digital book recommendation system and method based on reader preference, computer and storage medium | |
CN107885793A (en) | A kind of hot microblog topic analyzing and predicting method and system | |
CN106354818B (en) | Social media-based dynamic user attribute extraction method | |
CN103258000A (en) | Method and device for clustering high-frequency keywords in webpages | |
CN112966103B (en) | Mixed attention mechanism text title matching method based on multi-task learning | |
WO2008125531A1 (en) | Method and system for detection of authors | |
CN108363748B (en) | Topic portrait system and topic portrait method based on knowledge | |
CN108021715B (en) | Heterogeneous label fusion system based on semantic structure feature analysis | |
Dong et al. | Cross-media similarity evaluation for web image retrieval in the wild | |
Liu et al. | Correlation identification in multimodal weibo via back propagation neural network with genetic algorithm | |
CN108446333B (en) | Big data text mining processing system and method thereof | |
CN110569351A (en) | Network media news classification method based on restrictive user preference | |
CN108596205B (en) | Microblog forwarding behavior prediction method based on region correlation factor and sparse representation | |
CN111274791B (en) | Modeling method of user loss early warning model in online home decoration scene | |
TW201243627A (en) | Multi-label text categorization based on fuzzy similarity and k nearest neighbors | |
Zhao et al. | User-sentiment topic model: refining user's topics with sentiment information | |
Zeng et al. | Context-aware social media recommendation based on potential group |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |