CN112966103B - Mixed attention mechanism text title matching method based on multi-task learning - Google Patents

Mixed attention mechanism text title matching method based on multi-task learning Download PDF

Info

Publication number
CN112966103B
CN112966103B CN202110190612.1A CN202110190612A CN112966103B CN 112966103 B CN112966103 B CN 112966103B CN 202110190612 A CN202110190612 A CN 202110190612A CN 112966103 B CN112966103 B CN 112966103B
Authority
CN
China
Prior art keywords
text
matrix
data
title
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110190612.1A
Other languages
Chinese (zh)
Other versions
CN112966103A (en
Inventor
王维宽
冯翱
宋馨宇
张学磊
张举
蔡佳志
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu University of Information Technology
Original Assignee
Chengdu University of Information Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu University of Information Technology filed Critical Chengdu University of Information Technology
Priority to CN202110190612.1A priority Critical patent/CN112966103B/en
Publication of CN112966103A publication Critical patent/CN112966103A/en
Application granted granted Critical
Publication of CN112966103B publication Critical patent/CN112966103B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a mixed attention strategy text title matching method based on multi-task learning, which is characterized in that the multi-task learning of a model is realized by simultaneously carrying out a classification task 1 of the original type of a text and a classification task 2 of whether the text is a 'headline party' article on an input text, the model is jointly trained through a multi-task learning model, and one task assists the other task to learn better parameters. According to the method, the model parameters are adjusted by using the back propagation of the classification task 1, so that the classification task 2 can obtain better performance. The attention mechanism provided by the method can calculate the association degree of each element and other elements in one step, and has small calculation amount and high efficiency.

Description

Mixed attention mechanism text title matching method based on multi-task learning
Technical Field
The invention relates to the field of text processing, in particular to a text title matching method based on a mixed attention mechanism of multi-task learning.
Background
In the network era, the network text 'heading party' is still in the trend based on actual benefits brought by traffic hoarding and traffic incentive, so that the browsing experience of network users is reduced. The platform with active "title party" is the condition that users run off, and the sustainable development of the platform is affected.
The title party is a general name of website editors, reporters, managers and netizens for various purposes such as increasing click rate or known name by making an eye-catching title on a forum or media represented by the internet to attract the attention of audiences and making click-in discovery have a large and reasonable gap with the title. The "title party" main behavior is simply that the title of the posting is strongly exaggerated, and the content of the post is usually completely unrelated or not strongly related to the title.
The network user can report after reading the junk text without nutritive value, and a platform supervisor is expected to put the article off shelf. The web text publishing platform usually performs a certain degree of text detection during text publishing, but the detection mechanism is loose. In order to improve the user stickiness and the experience sense during text sending and avoid the user dislike and user loss caused by frequent text sending failure, the platform side only strictly checks whether the text contains sensitive words at present, and the text can not be successfully published if some words can be found in the text by matching the text with all words in the sensitive word dictionary.
Background auditing is also a means for improving the text quality of the platform on the platform side. The platform randomly samples a certain amount of successfully published texts for background manual review. Some texts with higher report quantity can be included in the scope of background review. But again based on user stickiness and user experience considerations, the backend personnel will not easily delete the user's text unless the article has a particularly obvious place of noncompliance. From the present, the 'heading party' is mainly based on manual detection and lacks systematic algorithm and model support.
And (5) manual detection. And deleting the network user or the platform auditor after the network user or the platform auditor is found manually. Because the subjective consciousness, reading habit and character interest are different, the manual judgment standard is not uniform. The user's habits of using the software are different and the report is not necessarily made. Based on massive network texts, a reporting mechanism of the platform sets a reporting quantity threshold value, and only if the reporting quantity of the articles exceeds the threshold value, the article reporting quantity is automatically transferred to a background for auditing.
Simple and loose machine detection method. And matching through words prestored in the dictionary, and if the words are matched, rejecting the user to issue. A loose detection mechanism cannot effectively detect the "title party" text. Most of texts of the 'heading party' are in word compliance, the number of words contained in a sensitive word dictionary is limited, and the method cannot be applied to the application scene in practical application.
In view of the above situation, an effective "headline party" text detection method is needed to filter articles in the "headline party" in the network, improve the quality of internet texts, improve the experience of network users, and enable various text push platforms to be continuously and healthily developed.
Disclosure of Invention
Aiming at the defects of the prior art, the method for matching the text titles of the mixed attention mechanism based on the multitask learning comprises the following steps:
step 1: crawling different types of 'title party' text data and normal text data to form a data set;
step 2: cleaning the data set, and removing interference characters of webpage labels and network emoticons;
and step 3: respectively marking the titles and the texts of the text data in the data set into categories to generate classified data, wherein the category marks comprise a classification task 1 and a classification task 2, the classification task 1 is an original category when the titles are marked as crawling data, and the classification task 2 is used for marking the texts as whether the texts are 'headline party' texts;
and 4, step 4: respectively performing word segmentation processing on the title and the text of the classification data obtained in the step 3 to obtain a text word sequence;
and 5: processing the text word sequence into a preset fixed length, wherein the length is not filled with 0, and the length is truncated;
step 6: randomly disordering the text data marked with the categories to fully mix the text of the 'title party' with the normal text;
and 7: dividing the mixed data set into batch data with the size of batch;
and 8: inputting the batch of data into a constructed text detection model for training, and specifically comprising the following steps:
step 8.1: inputting the title and the text of the batch of data into the same BERT model, and respectively obtaining word embedding matrixes T ═ T of the text and the title1,t2,t3……tn}、C={c1,c2,c3……cm},T∈Rn×300N is the word sequence length of the text, C is the element Rm×300M is the word sequence length of the title, 300 is the word vector dimension encoded by the standard BERT model, while obtaining the first output of the BERT model
Figure BDA0002935925160000031
And
Figure BDA0002935925160000032
step 8.2: randomly initializing a shared parameter matrix W ∈ R300×nPerforming matrix transformation to obtain a feature matrix M epsilon R mixing text and title informationm×300The mathematical expression of the matrix transformation is as follows:
Mm×300=Cm×300×W300×n×Tn×300
step 8.3: to pair
Figure BDA0002935925160000033
And
Figure BDA0002935925160000034
performing matrix transformation to obtain a characteristic matrix F e R300×300The matrix transform mathematical expression is as follows:
Figure BDA0002935925160000035
step 8.4: taking M as Q and V, taking F as K, calculating a mixed attention matrix A epsilon Rm×300The calculation method is as follows:
Figure BDA0002935925160000036
wherein d iskA second dimension of K;
step 8.5: fully connecting the mixed attention matrix A to obtain a dimension reduction matrix D epsilon Rm×n
Step 8.6: randomly initializing a first weight matrix W1Fully connecting the dimensionality reduction matrix D to obtain a one-dimensional matrix, calculating softmax as the output of the classification task 1 in the step 3, wherein the dimensionality is R1×jJ is the original category number of the data;
step 8.7: randomly initializing a second weight matrix W2Fully connecting the dimensionality reduction matrix D to obtain a one-dimensional matrix, calculating softmax as the output of the classification task 2 mentioned in the step 3, wherein the dimensionality is R1×2Two dimensions respectively represent the probability of being or not the "heading party";
step 8.8: taking the maximum value in the results of step 8.6 and step 8.7 as p of the corresponding taskiRespectively calculating cross entropy, summing and averaging, wherein the mathematical expression is as follows:
Figure BDA0002935925160000037
wherein n is the size of batch, yiIs a real label of the ith piece of data, piCalculating the maximum probability of the label to which the ith piece of data belongs for the model;
step 8.9: the result of the step 8.8 is used as an error to carry out back propagation and is used for training model parameters;
step 8.10: and setting an end condition, if the end condition is not met, repeating the step 8.1 to the step 8.9 until the end condition is met, and stopping the training of the model.
According to a preferred embodiment, the method further includes testing the trained text detection model, specifically including:
and step 9: and (3) executing the step 1 to the step 8.7 aiming at the trained model, taking the subscript of the maximum number in the output result of the task two in the step 8.7 as a final result, and not executing the step 8.8 to the step 8.10 any more.
The invention has the beneficial effects that:
1. the invention provides a mixed attention strategy based on multi-task learning, which is used for extracting key information from a text to match with a title so as to realize the detection of a 'title party' article and obviously improving the detection precision and accuracy of the title party.
2. As the RNN model forgets early input information along with the increase of the input sequence, the calculation amount is large, and the RNN is a time sequence model and cannot be calculated in parallel, the attention mechanism provided by the method can calculate the association degree of each element and other elements in one step, and is small in calculation amount and high in efficiency.
3. Meanwhile, the similarity of the text calculated by the text embedded vector can be interfered by a large amount of noise, the text detection method can avoid the influence of the text noise, is suitable for the variability of the naming strategy of the 'headline party' article, breaks through the limitation of the traditional similarity calculation method, realizes high-efficiency calculation and accurate classification, and reduces a large amount of manual operation.
Drawings
FIG. 1 is a flow chart of a method according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings in conjunction with the following detailed description. It should be understood that the description is intended to be exemplary only, and is not intended to limit the scope of the present invention. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present invention.
The following detailed description is made with reference to the accompanying drawings.
For a long text body and a short title, only from a character level analysis, most words of the text body are irrelevant to the title, and the similarity of the text calculated by the text embedding vector is interfered by a large amount of noise. The scheme provides a mixed attention strategy based on multi-task learning, which is used for extracting key information from a text and matching the key information with a title so as to realize the detection of a 'title party' article. Compared with the RNN-based model, the memory cells in the model forget the information inputted earlier due to the increase of the RNN model with the input sequence, and the calculation amount is larger. Meanwhile, the RNN is a time sequence model and cannot be calculated in parallel. The attention mechanism used by the scheme can calculate the association degree of each element and other elements in one step, and has small calculation amount and high efficiency.
The basic idea of the scheme is to design a technical scheme of matching degree of text and title to realize the identification of the 'title party' article.
The BERT used in the present invention is an auto-encoder, each of which outputs a context that encodes text, i.e., each element of T and W already contains context information. The first output of the BERT encodes full-text information, i.e.
Figure BDA0002935925160000051
And
Figure BDA0002935925160000052
full-text information of the title and full-text information of the body are encoded separately. Similar effects can be achieved by the RNN-based bidirectional timing model, but the RNN-based model requires a word embedding matrix to be obtained in one step, and the BERT can automatically encode by directly inputting a text sequence.
The mixed attention strategy calculates two different feature matrixes which are Q, V and K respectively through an attention mechanism mixed word embedding matrix of text and a word embedding matrix of a title, and a mixed attention matrix is calculated.
The attention mechanism can flexibly capture global and local relations by comparing each element with other elements, and is a one-step in-place process, namely the attention mechanism can pick up the emphasis in a long-sequence text. Thus, the hybrid attention strategy can focus attention on textual focus content matching the title through training.
The multi-task learning of the model is embodied in that the model simultaneously performs classification learning (classification task 1) of original text categories and classification learning (classification task 2) of whether the text is a 'headline party' article on the input text. The model is jointly trained through a multi-task learning model, and one task assists the other task to learn better parameters. The scheme adjusts the model parameters by using the back propagation of the classification task 1, so that the classification task 2 obtains better performance. The following detailed description of the embodiments refers to the accompanying drawings.
Step 1: crawling different categories of 'heading party' text data and normal text data to form a data set.
Step 2: and cleaning the data set, and removing the interference characters of the webpage labels and the network emoticons.
And step 3: respectively carrying out category marking on the title and the body of the text data in the data set to generate classified data, wherein the category marking comprises a classification task 1 and a classification task 2, the classification task 1 is an original category when the title is marked as crawling data, and the classification task 2 is used for marking the body as whether the text is a 'heading party' text.
And 4, step 4: and (4) performing word segmentation processing on the titles and the texts of the classification data obtained in the step (3) respectively to obtain a text word sequence.
And 5: and processing the text word sequence into a preset fixed length, wherein the length is not filled with 0, and the length exceeds the truncation.
Step 6: and randomly disordering the text data marked with the categories to fully mix the text of the 'title party' with the normal text.
And 7: the blended dataset is divided into batch data of batch size.
And 8: inputting the batch data into a constructed text detection model for training, and specifically comprising the following steps:
step 8.1: inputting the title and the text of the batch data into the same BERT model, and respectively obtaining word embedding matrixes T ═ T of the text and the title1,t2,t3……tn}、C={c1,c2,c3……cm}。T∈Rn×300N is the word sequence length of the text, C is the element Rm×300M is the word sequence length of the title, 300 is the word vector dimension encoded by the standard BERT model, while obtaining the first output of the BERT model
Figure BDA0002935925160000061
And
Figure BDA0002935925160000062
Figure BDA0002935925160000063
by default the title global information is encoded,
Figure BDA0002935925160000064
the body global information is encoded by default.
Step 8.2: randomly initializing a shared parameter matrix W ∈ R300×nPerforming matrix transformation to obtain a feature matrix M epsilon R mixing text and title informationm×300The mathematical expression of the matrix transformation is as follows:
Mm×300=Cm×300×W300×n×Tn×300
step 8.3: to pair
Figure BDA0002935925160000065
And
Figure BDA0002935925160000066
performing matrix transformation to obtain a characteristic matrix F e R300×300Change of matrixThe mathematical expressions are changed as follows:
Figure BDA0002935925160000071
step 8.4: taking M as Q and V, taking F as K, calculating a mixed attention matrix A epsilon Rm×300The calculation method is as follows:
Figure BDA0002935925160000072
wherein d iskA second dimension of K; the Attention mechanism consists of three parts, query (Q), value (V), Key (K).
Step 8.5: fully connecting the mixed attention matrix A to obtain a dimension reduction matrix D belonging to Rm×n
Step 8.6: randomly initializing a first weight matrix W1Fully connecting the dimensionality reduction matrix D to obtain a one-dimensional matrix, calculating softmax as the output of the classification task 1 in the step 3, wherein the dimensionality is R1×jAnd j is the original category number of the data.
Step 8.7: randomly initializing a second weight matrix W2Fully connecting the dimensionality reduction matrix D to obtain a one-dimensional matrix, calculating softmax as the output of the classification task 2 mentioned in the step 3, wherein the dimensionality is R1×2The two dimensions represent the probability of being or not the "heading party", respectively.
Step 8.8: taking the maximum value in the results of step 8.6 and step 8.7 as p of the corresponding taskiRespectively calculating cross entropy, summing and averaging, wherein the mathematical expression is as follows:
Figure BDA0002935925160000073
wherein n is the size of batch, yiIs a real label of the ith piece of data, piAnd calculating the maximum probability of the label to which the ith piece of data belongs for the model.
Step 8.9: and (5) performing back propagation on the result of the step 8.8 as an error for model parameter training.
Step 8.10: and setting an end condition, if the end condition is not reached, if the result in 1000 rounds is not promoted, repeating the steps from 8.1 to 8.9 until the end condition is met, and stopping the training of the model.
The method of the invention also comprises the step of testing the trained text detection model, which specifically comprises the following steps:
and step 9: and (3) executing the step 1 to the step 8.7 aiming at the trained model, taking the subscript of the maximum number in the output result of the task two in the step 8.7 as a final result, and not executing the step 8.8 to the step 8.10 any more.
It should be noted that the above-mentioned embodiments are exemplary, and that those skilled in the art, having benefit of the present disclosure, may devise various arrangements that are within the scope of the present disclosure and that fall within the scope of the invention. It should be understood by those skilled in the art that the present specification and figures are illustrative only and are not limiting upon the claims. The scope of the invention is defined by the claims and their equivalents.

Claims (2)

1. A text title matching method based on a mixed attention mechanism of multitask learning is characterized in that,
step 1: crawling different types of 'title party' text data and normal text data to form a data set;
step 2: cleaning the data set, and removing interference characters of webpage labels and network emoticons;
and step 3: respectively marking the titles and the texts of the text data in the data set into categories to generate classified data, wherein the category marks comprise a classification task 1 and a classification task 2, the classification task 1 is an original category when the titles are marked as crawling data, and the classification task 2 is used for marking the texts as whether the texts are 'headline party' texts;
and 4, step 4: respectively performing word segmentation processing on the title and the text of the classification data obtained in the step 3 to obtain a text word sequence;
and 5: processing the text word sequence into a preset fixed length, wherein the length is not filled with 0, and the length is truncated;
step 6: randomly disordering the text data marked with the categories to fully mix the text of the 'title party' with the normal text;
and 7: dividing the mixed data set into batch data with the size of batch;
and 8: inputting the batch of data into a constructed text detection model for training, and specifically comprising the following steps:
step 8.1: inputting the title and the text of the batch of data into the same BERT model, and respectively obtaining word embedding matrixes T ═ T of the text and the title1,t2,t3……tn}、C={c1,c2,c3……cm},T∈Rn×300N is the word sequence length of the text, C is the element Rm×300M is the word sequence length of the title, 300 is the word vector dimension encoded by the standard BERT model, while obtaining the first output of the BERT model
Figure FDA0002935925150000011
And
Figure FDA0002935925150000012
step 8.2: randomly initializing a shared parameter matrix W ∈ R300×nPerforming matrix transformation to obtain a feature matrix M epsilon R mixing text and title informationm×300The mathematical expression of the matrix transformation is as follows:
Mm×300=Cm×300×W300×n×Tn×300
step 8.3: to pair
Figure FDA0002935925150000021
And
Figure FDA0002935925150000022
is subjected to matrix transformation to obtainTo the feature matrix F ∈ R300×300The matrix transform mathematical expression is as follows:
Figure FDA0002935925150000023
step 8.4: taking M as Q and V, taking F as K, calculating a mixed attention matrix A epsilon Rm×300The calculation method is as follows:
Figure FDA0002935925150000024
wherein d iskA second dimension of K;
step 8.5: fully connecting the mixed attention matrix A to obtain a dimension reduction matrix D epsilon Rm×n
Step 8.6: randomly initializing a first weight matrix W1Fully connecting the dimensionality reduction matrix D to obtain a one-dimensional matrix, calculating softmax as the output of the classification task 1 in the step 3, wherein the dimensionality is R1×jJ is the original category number of the data;
step 8.7: randomly initializing a second weight matrix W2Fully connecting the dimensionality reduction matrix D to obtain a one-dimensional matrix, calculating softmax as the output of the classification task 2 mentioned in the step 3, wherein the dimensionality is R1×2Two dimensions respectively represent the probability of being or not the "heading party";
step 8.8: taking the maximum value in the results of step 8.6 and step 8.7 as p of the corresponding taskiRespectively calculating cross entropy, summing and averaging, wherein the mathematical expression is as follows:
Figure FDA0002935925150000025
wherein n is the size of batch, yiIs a real label of the ith piece of data, piCalculating the maximum probability of the label to which the ith piece of data belongs for the model;
step 8.9: the result of the step 8.8 is used as an error to carry out back propagation and is used for training model parameters;
step 8.10: and setting an end condition, if the end condition is not met, repeating the step 8.1 to the step 8.9 until the end condition is met, and stopping the training of the model.
2. The method for matching text titles according to claim 1, wherein the method further comprises testing the trained text detection model, specifically comprising:
and step 9: and (3) executing the step 1 to the step 8.7 aiming at the trained model, taking the subscript of the maximum number in the output result of the task two in the step 8.7 as a final result, and not executing the step 8.8 to the step 8.10 any more.
CN202110190612.1A 2021-02-05 2021-02-05 Mixed attention mechanism text title matching method based on multi-task learning Active CN112966103B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110190612.1A CN112966103B (en) 2021-02-05 2021-02-05 Mixed attention mechanism text title matching method based on multi-task learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110190612.1A CN112966103B (en) 2021-02-05 2021-02-05 Mixed attention mechanism text title matching method based on multi-task learning

Publications (2)

Publication Number Publication Date
CN112966103A CN112966103A (en) 2021-06-15
CN112966103B true CN112966103B (en) 2022-04-19

Family

ID=76285176

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110190612.1A Active CN112966103B (en) 2021-02-05 2021-02-05 Mixed attention mechanism text title matching method based on multi-task learning

Country Status (1)

Country Link
CN (1) CN112966103B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113688621B (en) * 2021-09-01 2023-04-07 四川大学 Text matching method and device for texts with different lengths under different granularities
CN115357720B (en) * 2022-10-20 2023-05-26 暨南大学 BERT-based multitasking news classification method and device

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107491436A (en) * 2017-08-21 2017-12-19 北京百度网讯科技有限公司 A kind of recognition methods of title party and device, server, storage medium
CN108304379A (en) * 2018-01-15 2018-07-20 腾讯科技(深圳)有限公司 A kind of article recognition methods, device and storage medium
CN108429920A (en) * 2018-02-06 2018-08-21 北京奇虎科技有限公司 A kind of method and apparatus of processing title party video
CN108491389A (en) * 2018-03-23 2018-09-04 杭州朗和科技有限公司 Click bait title language material identification model training method and device
CN109614614A (en) * 2018-12-03 2019-04-12 焦点科技股份有限公司 A kind of BILSTM-CRF name of product recognition methods based on from attention
CN109657055A (en) * 2018-11-09 2019-04-19 中山大学 Title party article detection method and federal learning strategy based on level hybrid network
CN109753567A (en) * 2019-01-31 2019-05-14 安徽大学 A kind of file classification method of combination title and text attention mechanism
CN110210022A (en) * 2019-05-22 2019-09-06 北京百度网讯科技有限公司 Header identification method and device
CN110598046A (en) * 2019-09-17 2019-12-20 腾讯科技(深圳)有限公司 Artificial intelligence-based identification method and related device for title party
CN111199155A (en) * 2018-10-30 2020-05-26 飞狐信息技术(天津)有限公司 Text classification method and device
CN111813932A (en) * 2020-06-17 2020-10-23 北京小米松果电子有限公司 Text data processing method, text data classification device and readable storage medium
CN112287105A (en) * 2020-09-30 2021-01-29 昆明理工大学 Method for analyzing correlation of law-related news fusing bidirectional mutual attention of title and text

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9881059B2 (en) * 2014-08-08 2018-01-30 Yahoo Holdings, Inc. Systems and methods for suggesting headlines

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107491436A (en) * 2017-08-21 2017-12-19 北京百度网讯科技有限公司 A kind of recognition methods of title party and device, server, storage medium
CN108304379A (en) * 2018-01-15 2018-07-20 腾讯科技(深圳)有限公司 A kind of article recognition methods, device and storage medium
CN108429920A (en) * 2018-02-06 2018-08-21 北京奇虎科技有限公司 A kind of method and apparatus of processing title party video
CN108491389A (en) * 2018-03-23 2018-09-04 杭州朗和科技有限公司 Click bait title language material identification model training method and device
CN111199155A (en) * 2018-10-30 2020-05-26 飞狐信息技术(天津)有限公司 Text classification method and device
CN109657055A (en) * 2018-11-09 2019-04-19 中山大学 Title party article detection method and federal learning strategy based on level hybrid network
CN109614614A (en) * 2018-12-03 2019-04-12 焦点科技股份有限公司 A kind of BILSTM-CRF name of product recognition methods based on from attention
CN109753567A (en) * 2019-01-31 2019-05-14 安徽大学 A kind of file classification method of combination title and text attention mechanism
CN110210022A (en) * 2019-05-22 2019-09-06 北京百度网讯科技有限公司 Header identification method and device
CN110598046A (en) * 2019-09-17 2019-12-20 腾讯科技(深圳)有限公司 Artificial intelligence-based identification method and related device for title party
CN111813932A (en) * 2020-06-17 2020-10-23 北京小米松果电子有限公司 Text data processing method, text data classification device and readable storage medium
CN112287105A (en) * 2020-09-30 2021-01-29 昆明理工大学 Method for analyzing correlation of law-related news fusing bidirectional mutual attention of title and text

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MODELING MULTI-TARGETS SENTIMENT CLASSIFICATION VIA GRAPH CONVOLUTIONAL NETWORKS AND AUXILIARY RELATION;FENG AO ET AL.;《COMPUTERS MATERIALS & CONTINUA》;20201231;第909-923页 *
识别网络新闻标题党;张晓春;《文学教育(上)》;20180205(第02期);第164-165页 *

Also Published As

Publication number Publication date
CN112966103A (en) 2021-06-15

Similar Documents

Publication Publication Date Title
CN111008278B (en) Content recommendation method and device
CN106599022B (en) User portrait forming method based on user access data
CN109933664B (en) Fine-grained emotion analysis improvement method based on emotion word embedding
CN107515873B (en) Junk information identification method and equipment
US8630972B2 (en) Providing context for web articles
CN101520784B (en) Information issuing system and information issuing method
Lenz et al. Measuring the diffusion of innovations with paragraph vector topic models
CN109918621B (en) News text infringement detection method and device based on digital fingerprints and semantic features
CN113590970B (en) Personalized digital book recommendation system and method based on reader preference, computer and storage medium
CN107885793A (en) A kind of hot microblog topic analyzing and predicting method and system
CN106354818B (en) Social media-based dynamic user attribute extraction method
CN103258000A (en) Method and device for clustering high-frequency keywords in webpages
CN112966103B (en) Mixed attention mechanism text title matching method based on multi-task learning
WO2008125531A1 (en) Method and system for detection of authors
CN108363748B (en) Topic portrait system and topic portrait method based on knowledge
CN108021715B (en) Heterogeneous label fusion system based on semantic structure feature analysis
Dong et al. Cross-media similarity evaluation for web image retrieval in the wild
Liu et al. Correlation identification in multimodal weibo via back propagation neural network with genetic algorithm
CN108446333B (en) Big data text mining processing system and method thereof
CN110569351A (en) Network media news classification method based on restrictive user preference
CN108596205B (en) Microblog forwarding behavior prediction method based on region correlation factor and sparse representation
CN111274791B (en) Modeling method of user loss early warning model in online home decoration scene
TW201243627A (en) Multi-label text categorization based on fuzzy similarity and k nearest neighbors
Zhao et al. User-sentiment topic model: refining user's topics with sentiment information
Zeng et al. Context-aware social media recommendation based on potential group

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant