CN111639252A - False news identification method based on news-comment relevance analysis - Google Patents

False news identification method based on news-comment relevance analysis Download PDF

Info

Publication number
CN111639252A
CN111639252A CN202010420460.5A CN202010420460A CN111639252A CN 111639252 A CN111639252 A CN 111639252A CN 202010420460 A CN202010420460 A CN 202010420460A CN 111639252 A CN111639252 A CN 111639252A
Authority
CN
China
Prior art keywords
comment
news
characteristic
matrix
vectors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010420460.5A
Other languages
Chinese (zh)
Inventor
李玉华
张文杰
李瑞轩
辜希武
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN202010420460.5A priority Critical patent/CN111639252A/en
Publication of CN111639252A publication Critical patent/CN111639252A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the field of news detection, and particularly relates to a false news identification method based on news-comment relevance analysis, which comprises the following steps: constructing a two-dimensional news characteristic matrix based on the content of each text clause in news, constructing a one-dimensional characteristic vector of each comment according to the content of each comment, and constructing a plurality of comment trees by taking each initial comment as a root node and each reply comment as a child node; combining each node feature vector in each comment tree with a parent node context association feature vector thereof, calculating all leaf node context association feature vectors in the comment tree and performing weighted calculation to obtain comment tree feature vectors, wherein all comment tree feature vectors form a two-dimensional comment feature matrix; and matching the relevance between the news characteristic matrix and the comment characteristic matrix to obtain a news characteristic vector and a comment characteristic vector so as to judge the authenticity of news. The method makes full use of the news text and the information generated in the spreading process, has strong accuracy, and is suitable for large-scale social networks.

Description

False news identification method based on news-comment relevance analysis
Technical Field
The invention belongs to the field of news detection, and particularly relates to a false news identification method based on news-comment relevance analysis.
Background
The explosion of network technology makes the acquisition cost of information lower and lower, and the network technology is ubiquitous and provides a foundation for the rise of social networks. The user can easily and conveniently acquire and publish information from the social network, and the convenience reduces the threshold of generation and dissemination of false news. False news can cause severe public opinion pressure and social panic through wild propagation of social networks, exploiting the untimely nature of information disclosure. False news seriously affects social network environment and creates group anxiety, so effective identification of false news in social networks is a problem to be solved in the current social background.
The identification of false news is primarily directed to news text. The method mainly relates to two aspects, namely, (1) extracting knowledge related to news, and comparing the knowledge with a knowledge gallery; (2) the text sentence is analyzed in syntax and whether uncertain description often appears in the related expression is judged. With the rise of social networks, how to reasonably utilize social network information and improve news authenticity identification capability becomes the most worthy of discussion. Therefore, recently, an analysis method is started to put emphasis on a propagation process or a comment text, (1) the propagation process is analyzed from the macroscopic and microscopic fields, and news authenticity is inferred according to the propagation scale; (2) according to the trusted degree of the users in the propagation path, the rating of the user quality in the propagation network is obtained, and further the authenticity of news is judged; (3) the truth of news is analyzed according to the conflict degree of the opinions in the comments, intense discussion is triggered, the opinions with conflict opinions can enable people to create enough doubt on the truth of information, and a certain effect is achieved by simulating the process of understanding the information by human beings.
However, the existing method only focuses on news texts or only focuses on the form of a spreading process, and the existing method is too dependent on news contents, and has poor adaptability to the brand new field with poor current knowledge. The social robot has certain interference on the construction of a transmission network, and the enhanced exposure rate of the social robot can enhance the transmission behavior of users in the whole network, so that the method of throwing away news and only paying attention to the transmission process has certain limitation.
Disclosure of Invention
The invention provides a false news identification method based on news-comment relevance analysis, which is used for solving the technical problem of low identification precision caused by the fact that news texts are concentrated on one side or networks are spread in the existing false news identification.
The technical scheme for solving the technical problems is as follows: a false news identification method based on news-comment relevance analysis comprises the following steps:
s1, constructing a news feature matrix based on the content of news to be identified, and constructing a feature vector of each comment based on the content of each comment of the news to be identified; meanwhile, according to the reply relation among the comments, constructing a plurality of comment trees by taking each initial comment as a root node and each reply comment as a child node;
s2, associating the feature vector of each node in each comment tree with the context associated feature vector of the father node of the comment tree, obtaining the context associated feature vectors of all leaf nodes of the comment tree through recursive calculation, and performing weighted calculation to obtain the feature vector of the comment tree;
s3, matching the relevance between the news characteristic matrix and the characteristic vectors of all comment trees to obtain attention weights between news clauses considering comments, weighting the vectors corresponding to all text clauses in the news characteristic matrix to obtain news characteristic vectors, obtaining attention weights between comment trees considering news, weighting the characteristic vectors of all comment trees to obtain comment characteristic vectors, and judging the authenticity of news based on the news characteristic vectors and the comment characteristic vectors.
The invention has the beneficial effects that: the method fully utilizes the content inducing discussion in the news and the comment information as the key content for identifying the authenticity of the news, and the authenticity of the news text is deduced based on the matching degree of the core viewpoints of the news and the comment information. Wherein, a comment tree of each initial comment is constructed, each initial comment is used as a root node, each reply comment is used as a child node, each comment information depends on the context information contained in the father node, thus by combining the feature vector of each node in each review tree with the feature vector of its parent's associated context information, to compute the feature vector of the associated context information for that node, and since each leaf node represents the end of a discussion, therefore, weighting calculation is carried out among the feature vectors of the associated context information of all leaf nodes in each comment tree, finally a one-dimensional feature vector of the comment tree (namely each initial comment) is obtained, the one-dimensional feature vector of each initial comment obtained by the method is fully fused with the key information of the discussion, the information utilization rate is high, and the accuracy of news judgment is guaranteed. In addition, the method also matches the relevance between the news characteristic matrix and all the comment tree characteristic vectors, and sufficiently matches and considers the news characteristic matrix and all the comment tree characteristic vectors to respectively generate the attention weight between news clauses considering comments and the attention weight between comment trees considering news, so that the finally obtained news characteristic vector and comment characteristic vector can be effectively used for news identification. The method overcomes the phenomenon that news texts are focused on one side or networks are spread in the prior art, can combine key information in the comments, particularly more key information introduced in the comment reply discussion process, has high news judgment accuracy, and can adapt to false news identification in a large-scale social network.
On the basis of the technical scheme, the invention can be further improved as follows.
Further, the method for constructing the news feature matrix specifically comprises the following steps:
acquiring text content of news to be identified, segmenting sentences and words of the text content, and performing word vector conversion on words after word segmentation; converting all the word vectors into hidden state vectors of associated context information by adopting a recurrent neural network; and weighting all the hidden state vectors corresponding to each clause obtained by the clause by adopting an attention mechanism, representing the clause as a one-dimensional characteristic vector, wherein the characteristic vectors of all the clauses form a two-dimensional news characteristic matrix of news to be identified.
The invention has the further beneficial effects that: the recurrent neural network can effectively retain the context information in an iterative manner, so that words can be associated with one another. For the semantic understanding process, different information in the text sequence has different degrees of influence, the attention mechanism can observe from different angles in a longer text sequence, the most key information in the text sequence is found and higher weight is given, so that the most key information in the text sequence plays a more important role in subsequent characterization vectors, therefore, the information expressed in the text can be more accurately obtained by utilizing the recurrent neural network and the attention mechanism, and the prediction effect of the model is improved.
Further, the constructing of the one-dimensional feature vector of each comment based on the content of each comment of the news to be identified specifically includes:
acquiring text content of each comment, segmenting the text content into words, and performing word vector conversion on the segmented words; converting all the word vectors into hidden state vectors of associated context information by adopting a recurrent neural network; all the hidden state vectors are weighted by an attention mechanism, and the comment is expressed as a one-dimensional feature vector.
The invention has the further beneficial effects that: because the comment information is short relative to the news text, sentence-level splitting is not performed any more, and the comments are directly regarded as a sentence, so that the comment text is converted into vector representation for subsequent association of news and comments.
Further, in S1, all the recurrent neural networks are bidirectional long-short term memory networks.
The invention has the further beneficial effects that: the bidirectional long-short term memory network can effectively acquire the context information, has the capabilities of selective memory and selective forgetting, and can better retain the key context information with longer distance. In a training model with a longer input text sequence, the long-term and short-term memory network can effectively solve the problem of gradient disappearance, obtain a better training effect and ensure that the method can be suitable for false news identification in a large-scale social network.
Further, in S2, a gate loop unit is used to obtain context associated feature vectors of all leaf nodes through recursive computation.
The invention has the further beneficial effects that: compared with other cyclic neural network methods, the gate cycle unit can effectively solve the problem of gradient disappearance during model training by using reset gating and update gating when the tree structure is deeper, namely, when the discussion amount is large, and the method can be suitable for false news identification in a large-scale social network. Meanwhile, effective discussion information in the comment tree can be effectively acquired by utilizing two gates, model parameters are reduced, and training speed is effectively improved.
Further, in S2, the feature vector construction method of each comment tree is as follows:
combining the feature vector of the current node with the hidden state vector of the father node of each comment tree from top to bottom based on a gate cycle unit, calculating reset gating used for retaining partial hidden state information of the father node and update gating used for adjusting the retention proportion of the hidden state information of the father node of the node, and calculating the hidden state vectors of all the nodes in the comment tree through recursive processing; and processing the hidden state vectors of all leaf nodes of the comment tree by using a pooling method to obtain the feature vector of the comment tree.
The invention has the further beneficial effects that: the method comprises the steps of calculating reset gating for retaining partial hidden state information of a father node, calculating update gating for adjusting the retention proportion of the hidden state information of the father node, guaranteeing the fusion degree of each node and the father node based on the two parameters, calculating more reasonable and accurate context associated feature vectors of each node, and performing normalization weighting by adopting a pooling method, so that the method is simple and convenient
Further, the reset gating riThe calculation formula is as follows: r isi=σ(Wrci+Urhp(i)) Said update gating ziThe calculation formula is as follows: z is a radical ofi=σ(Wzci+Uzhp(i)) In the formula, Wr、WzAre all parameter matrices, Ur、UzAre all parameter vectors, σ is the activation function, hp (i)And hiding the state vector for the parent node of the ith node.
Further, the S3 includes:
matching the relevance between the news characteristic matrix and the comment characteristic matrix by adopting a collaborative attention network to construct a similarity matrix, wherein the comment characteristic matrix is formed by characteristic vectors of all comment trees;
using a similarity matrix to correlate the news characteristic matrix with the comment characteristic matrix so as to update the news characteristic matrix and the comment characteristic matrix, and obtaining a new news characteristic matrix fused with comment information and a new comment characteristic matrix fused with news information;
calculating to obtain a collaborative attention weight among news clauses based on the new news characteristic matrix, and calculating to obtain a collaborative attention weight among comment trees based on the new comment characteristic matrix;
weighting vectors corresponding to all text clauses in the news characteristic matrix before updating by adopting the cooperative attention weight among news clauses to obtain a news characteristic vector, and weighting the characteristic vectors of all comment trees in the comment characteristic matrix before updating by adopting the cooperative attention weight among comment trees to obtain a comment characteristic vector;
and fully connecting the news characteristic vector with the comment characteristic vector to judge the authenticity of the news.
The invention has the further beneficial effects that: and a collaborative attention network is adopted to correlate the two matrixes so as to calculate the collaborative attention weight between the news clauses fused with the comments and the collaborative attention weight between the comment trees fused with the news, so that the reliability is high.
Further, the update formula of the news characteristic matrix is as follows: hs=tanh(WsS+(WcC)F),The updating formula of the comment feature matrix is as follows: hc=tanh(WcC+(WsS)FT) In the formula, HsFor the updated new news feature matrix, HcFor a new comment feature matrix after updating, S is the news feature matrix before updating, C is the comment feature matrix before updating, F is a similarity matrix, W is a similarity matrixc、WsAre all parameter matrices.
The present invention also provides a machine-readable storage medium having stored thereon machine-executable instructions that, when invoked and executed by a processor, cause the processor to implement any of the above-described false news identification methods based on news-comment relevance analysis.
Drawings
Fig. 1 is a flowchart of a false news identification method based on news-comment relevance analysis according to an embodiment of the present invention;
fig. 2 is a schematic diagram of false news identification based on news-comment relevance analysis according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Example one
A false news identification method 100 based on news-comment relevance analysis, as shown in fig. 1, includes:
step 110, constructing a news characteristic matrix based on the content of news to be identified, and constructing a characteristic vector of each comment based on the content of each comment of the news to be identified; meanwhile, according to the reply relation among the comments, constructing a plurality of comment trees by taking each initial comment as a root node and each reply comment as a child node;
step 120, associating the feature vector of each node in each comment tree with the context associated feature vector of the father node of the comment tree, obtaining the context associated feature vectors of all leaf nodes of the comment tree through recursive calculation, and performing weighted calculation to obtain the feature vector of the comment tree;
and step 130, matching the relevance between the news characteristic matrix and the characteristic vectors of all the comment trees to obtain attention weights between the news clauses considering comments, weighting the vectors corresponding to all the text clauses in the news characteristic matrix to obtain news characteristic vectors, obtaining attention weights between the comment trees considering news, weighting the characteristic vectors of all the comment trees to obtain comment characteristic vectors, and judging the authenticity of news based on the news characteristic vectors and the comment characteristic vectors.
In the method, a comment tree of each initial comment is constructed, each initial comment is used as a root node, each reply comment is used as a child node, each comment information depends on the context information contained in the parent node of the comment information, thus by combining the feature vector of each node in each review tree with the feature vector of its parent's associated context information, to compute the feature vector of the associated context information for that node, and since each leaf node represents the end of a discussion, therefore, weighting calculation is carried out among the feature vectors of the associated context information of all leaf nodes in each comment tree, finally a one-dimensional feature vector of the comment tree (namely each initial comment) is obtained, the one-dimensional feature vector of each initial comment obtained by the method is fully fused with the key information of the discussion, the information utilization rate is high, and the accuracy of news judgment is guaranteed. In addition, the method also matches the relevance between the news characteristic matrix and all the comment tree characteristic vectors, and sufficiently matches and considers the news characteristic matrix and all the comment tree characteristic vectors to respectively generate the attention weight between news clauses considering comments and the attention weight between comment trees considering news, so that the finally obtained news characteristic vector and comment characteristic vector can be effectively used for news identification.
Therefore, the method is a novel false news identification method in a social network, and comprises five processes of data collection and processing, news text processing, comment text processing, news-comment cooperative processing and relevance result analysis. The method uses a new angle of similarity of key contents in news and comments to judge authenticity, fully utilizes news texts and information generated in the social network transmission process, and overcomes the phenomenon that the news texts are heavily focused or the network is transmitted in the prior art. The method can relieve the information one-sided problem caused by excessively depending on news texts, can provide powerful help for authenticity judgment by combining key information in the comments, particularly more key information introduced in the comment reply discussion process, can adapt to false news identification in a large-scale social network, and can solve the problem that news content is difficult to automatically verify.
Preferably, in step 110, a recurrent neural network and an attention mechanism are respectively adopted, a two-dimensional news feature matrix of the news is constructed based on the content of each text clause in the news to be identified, and a one-dimensional feature vector of each comment is constructed according to the content of each comment of the news to be identified. In step 120, a cyclic neural network is adopted to combine the feature vector of the current node in each comment tree with the hidden state vector of the father node of the comment tree, calculate the hidden state vector of the current node, and perform pooling processing on the hidden state vectors of all leaf nodes of the comment tree to obtain the feature vector of the comment tree. In step 130, a collaborative attention network is adopted to match the correlation between the news feature matrix and the comment feature matrix composed of feature vectors of all comment trees, so as to obtain the collaborative attention weight between news clauses and the collaborative attention weight between comment trees.
The method comprises the steps of firstly obtaining news text content, obtaining vector representation of the whole news text, specifically, firstly using a recurrent neural network and an attention mechanism for the word-level vector of the whole news text, and obtaining feature representation of each sentence. And then, a recurrent neural network is used for the feature vectors at the sentence level, so that each sentence obtains the context information similar to the sentence. Through the use of a hierarchical attention model, key feature information in news text is converted into a feature vector representation of the text information.
In addition, comment text content is obtained, and for feature vector representation of the comment text, a word-level recurrent neural network and an attention mechanism are used to obtain feature representation of each comment. The comments have relevance with each other, a tree-shaped comment structure (namely a comment tree) is constructed according to the reply relation of the comments, and reply information and replied information are related through the tree structure, so that the context information of each comment can be more fully understood.
The method comprises the steps of obtaining vector representation of a comment tree by using a tree neural network, specifically, taking an initial comment in each comment tree structure as a root node and a reply of each node as a child node of a current node, wherein each comment information depends on context information contained in a father node of the comment tree structure due to the fact that the comment adopts the tree structure, each leaf node represents the end of one discussion, information in the comment tree is processed by a top-down method, a cyclic neural network is used for calculation, and a hidden state vector h of the father node is calculatedp(i)Comment information (i.e. one-dimensional feature vector of comment) c with current nodeiCalculating the hidden state vector h of the current node in combinationi
The feature vector representation of the news text and the vector representation of the review tree are input into the collaborative attention network. By using the collaborative attention network, the relevance of the comments and the text information can be combined, the collaborative attention weight among all text sentences of the news is generated, and then the news text is weighted. Meanwhile, the method can also generate the weight relation of each comment tree and weight the comment trees. And constructing a guide vector of the news text-comment tree through correlation between the news text and the comment tree, and inputting the guide vector into a full-link layer to judge the authenticity label of the news.
The above mentioned news text and comment information need to be vectorized and represented. And splitting the text information of the related field into independent words by using a word segmentation tool. And after sequencing according to the occurrence frequency, constructing a mapping relation of vocabulary-index and index-vocabulary. And constructing a co-occurrence matrix according to the vocabulary and the occurrence positions in the context window, and obtaining word vector representation w through iterative training according to the similarity between the vocabulary and the co-occurrence matrix. The pre-training method can embody the relevance and similarity among vocabularies through the form of vectors, and through the mode, certain semantic features are captured in word vectors, so that the vocabulary information can be more conveniently utilized through the operation of the vectors.
It should be noted that the news text information is mainly the content of the main body part of the news, and the hyperlinks mentioned in the text need to be replaced uniformly during processing. The news comment information is obtained by searching news titles in the social network to obtain text information of relevant social network comment content, and then obtaining a tree structure of comments through a mutual reply process among the comments, wherein the tree structure contains certain information of a propagation network.
Preferably, the method for constructing the news feature matrix specifically comprises the following steps:
acquiring text content of news to be identified, segmenting sentences and words of the text content, and performing word vector conversion on words after word segmentation; converting all the word vectors into hidden state vectors of associated context information by adopting a recurrent neural network; and weighting all the hidden state vectors corresponding to each clause obtained by the clause by adopting an attention mechanism, representing the clause as a one-dimensional characteristic vector, wherein the characteristic vectors of all the clauses form a two-dimensional news characteristic matrix of news to be identified.
Specifically, as shown in fig. 2, a news text is divided into sentences according to punctuations, a clause is obtained after the sentence division, the clause is converted into an independent word by using a word division tool, and the word after the word division is subjected to word vector conversion. All vectors are spliced to obtain a vector matrix
Figure BDA0002496677330000101
Wherein
Figure BDA0002496677330000102
Representing a splicing operation, the news text S is composed of n clauses, SiRepresenting the ith clause in the news text. For each clause
Figure BDA0002496677330000103
Is composed of m words, where wjThe j-th word vector representation in the clause is represented, thus converting the news text into a three-dimensional vector representation. Inputting the three-dimensional vector into a bidirectional long-short term memory network to obtain the hidden state of each vocabulary
Figure BDA0002496677330000104
Wherein
Figure BDA0002496677330000105
Figure BDA0002496677330000106
The hidden state of the jth word representing the ith clause is respectively composed of a forward long-term memory network and a backward short-term memory network, and the associated context information representation of each word in the clauses is obtained. And combining the hidden states of all words in the clause with the attention weights of the words to obtain the vector representation of the clause. By calculating word attention weights
Figure BDA0002496677330000107
Wherein
Figure BDA0002496677330000108
Combining the hidden vector representation to obtain the representation result of the clause
Figure BDA0002496677330000109
All clause vector representations in the text are input into a bidirectional long-short term memory network to obtain the hidden state of each clause
Figure BDA0002496677330000111
Wherein
Figure BDA0002496677330000112
Figure BDA0002496677330000113
siAnd the hidden state of the ith clause is represented and respectively consists of a forward long-term memory network and a backward short-term memory network, and the associated context information of each clause in the text is obtained to be represented.
Preferably, the one-dimensional feature vector of each comment of the news to be identified is constructed according to the content of the comment, and specifically includes:
acquiring text content of each comment, segmenting the text content into words, and performing word vector conversion on the segmented words; converting all the word vectors into hidden state vectors of associated context information by adopting a recurrent neural network; all the hidden state vectors are weighted by an attention mechanism, and the comment is expressed as a one-dimensional feature vector.
Because each piece of comment information and the corresponding reply content thereof construct a comment tree, the reply information is associated with the replied information through the tree structure, the context information of each comment can be more fully understood, on the basis of the next time, each piece of information is converted into an independent word by using a word segmentation tool, and then the word vector conversion is carried out on the word after word segmentation. t ═ c1⊙c2⊙…⊙ci⊙…⊙cpWhere t denotes that a comment tree is composed of p pieces of comment information, ⊙ represents the association-building comment tree operation, ciAnd representing the ith comment information in the comment tree. For each comment
Figure BDA0002496677330000114
ciIs composed of q words, where wjThe jth word vector in the comment information is represented, and the comment information is shorter than the news text in length, so that sentence-level splitting is not performed any more, and the comment text is converted into vector representation. Inputting the comment vector into a bidirectional long-short term memory network to obtain the hidden state of each vocabulary
Figure BDA0002496677330000115
Wherein
Figure BDA0002496677330000116
Figure BDA0002496677330000117
Figure BDA0002496677330000118
The hidden state of the jth word representing the ith clause is respectively composed of a forward long-term memory network and a backward short-term memory network, and the associated context information representation of each word in the clauses is obtained. And combining the hidden states of all words in the comment information with the attention weights of the words to obtain the vector representation of the comment information. By calculating word attention weights
Figure BDA0002496677330000119
Wherein
Figure BDA00024966773300001110
Figure BDA0002496677330000121
Obtaining the representation result of the ith comment information by combining the hidden vector representation
Figure BDA0002496677330000122
Figure BDA0002496677330000123
Preferably, in step 110, all recurrent neural networks are bidirectional long-term and short-term memory networks.
Preferably, the recurrent neural network in step 120 employs a gate cycle unit.
Preferably, in step 120, the feature vector construction method of each comment tree is as follows:
based on a gate cycle unit, combining a feature vector of a current node with a hidden state vector of a father node of each comment tree from top to bottom, calculating a reset gate of the node for retaining partial hidden state information of the father node and an update gate for adjusting the retention proportion of the hidden state information of the father node, and calculating the hidden state vectors of all the nodes in the comment tree through recursive processing; and processing the hidden state vectors of all leaf nodes of the comment tree by using a pooling method to obtain the feature vector of the comment tree.
Specifically, the initial comment in each comment tree structure is taken as a root node, and the replies among the comments are taken as child nodes. A comment tree information processing method is provided based on a gate cycle unit (GRU), a father node of the ith node is represented by p (i), and a reset gate r is calculated firstlyi=σ(Wrci+Urhp(i)) Recalculating updated gating zi=σ(Wzci+Uzhp(i)) Reserving partial hidden state information of the father node by using reset gating, and adjusting the reservation proportion of the father node information by using update gating
Figure BDA0002496677330000124
Figure BDA0002496677330000125
In the formula W*、U*Both are a parameter matrix and a parameter vector, and σ represents the activation function. Calculating the hidden states h of all leaf nodes after recursively processing the comment tree structureiProcessing the hidden states of all leaf nodes by using a pooling method to obtain the feature representation t of each comment treei
Preferably, step 130 includes:
matching the relevance between the news characteristic matrix and the comment characteristic matrix by adopting a collaborative attention network to construct a similarity matrix; using the similarity matrix to correlate the news characteristic matrix with the comment characteristic matrix so as to update the news characteristic matrix and the comment characteristic matrix and obtain a new news characteristic matrix fused with comment information and a new comment characteristic matrix fused with news information; calculating based on the new news characteristic matrix to obtain a collaborative attention weight among news clauses, and calculating based on the new comment characteristic matrix to obtain a collaborative attention weight among comment trees; weighting vectors corresponding to all text clauses in a news characteristic matrix before updating by adopting the cooperative attention weight among news clauses to obtain a news characteristic vector, and weighting the characteristic vectors of all comment trees in a comment characteristic matrix before updating by adopting the cooperative attention weight among comment trees to obtain a comment characteristic vector; and fully connecting the news characteristic vector with the comment characteristic vector to judge the authenticity of the news.
Applying the cooperative attention weight of the news to a feature matrix of a news text to obtain a news representation, and applying the cooperative attention weight of the comment to a feature matrix of the comment to obtain a comment representation; and fully connecting the news representation and the comment representation, and judging the authenticity label of the news.
Specifically, the relevance between the text vector and the comment vector of each piece of news is matched by using a cooperative attention mechanism, the matched key information is captured, and a similarity matrix is constructed. Where the text is S ═ S1,…,sNThe comment is C ═ t1,…,tPObtaining F ═ tanh (C)TWlS) similarity matrix. Using the similarity matrix to correlate the news text and the comments, and respectively obtaining news information of the fusion comments and comment information of the fusion news, Hs=tanh(WsS+(WcC)F),Hc=tanh(WcC+(WsS)FT) Finally obtain the cooperative attention weight of news
Figure BDA0002496677330000131
Collaborative attention weighting of comments
Figure BDA0002496677330000132
In the formula W*And w*Are all parameter matrices. Applying the cooperative attention weight of news to the news representation vector obtained in S1.4 to obtain a news representation
Figure BDA0002496677330000133
Applying the collaborative attention weight of the comment to the comment tree representation vector obtained in S2.4 to obtain a comment representation
Figure BDA0002496677330000134
Fully connecting the news representation and the comment representation, and using
Figure BDA0002496677330000135
A vector of size (1 × 2) is obtained, with the two values representing the probability of the model predicting whether news is true or false, respectively.
The result of one prediction can be obtained by the steps, wherein the weight matrix W*And a bias parameter b*The method is obtained by learning of the neural network, the neural network is initialized randomly at first, and the neural network can learn reasonable parameter configuration through continuous training iteration of a training set. After the softmax function normalization is used, the accuracy of the neural network on the news authenticity judgment result can be obtained more intuitively.
Example two
A machine-readable storage medium having stored thereon machine-executable instructions which, when invoked and executed by a processor, cause the processor to implement a false news identification method based on news-comment relevance analysis as described in embodiment one above.
The related technical solution is the same as the first embodiment, and is not described herein again.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A false news identification method based on news-comment relevance analysis is characterized by comprising the following steps:
s1, constructing a news feature matrix based on news content to be identified, and constructing a feature vector of each comment based on the content of the comment of the news to be identified; meanwhile, according to the reply relation among the comments, constructing a plurality of comment trees by taking each initial comment as a root node and each reply comment as a child node;
s2, associating the feature vector of each node in each comment tree with the context associated feature vector of the father node of the comment tree, obtaining the context associated feature vectors of all leaf nodes of the comment tree through recursive calculation, and performing weighted calculation to obtain the feature vector of the comment tree;
s3, matching the relevance between the news characteristic matrix and the characteristic vectors of all comment trees to obtain attention weights between news clauses considering comments, weighting the vectors corresponding to all text clauses in the news characteristic matrix to obtain news characteristic vectors, obtaining attention weights between comment trees considering news, weighting the characteristic vectors of all comment trees to obtain comment characteristic vectors, and judging the authenticity of news based on the news characteristic vectors and the comment characteristic vectors.
2. The false news identification method based on news-comment relevance analysis according to claim 1, wherein the news feature matrix is specifically constructed by:
acquiring text content of news to be identified, segmenting sentences and words of the text content, and performing word vector conversion on words after word segmentation; converting all the word vectors into hidden state vectors of associated context information by adopting a recurrent neural network; and weighting all the hidden state vectors corresponding to each clause obtained by the clause by adopting an attention mechanism, representing the clause as a one-dimensional characteristic vector, wherein the characteristic vectors of all the clauses form a two-dimensional news characteristic matrix of news to be identified.
3. The false news identification method based on news-comment relevance analysis according to claim 2, wherein the one-dimensional feature vector of each comment of news to be identified is constructed according to the content of the comment, and specifically comprises:
acquiring text content of each comment, segmenting the text content into words, and performing word vector conversion on the segmented words; converting all the word vectors into hidden state vectors of associated context information by adopting a recurrent neural network; all the hidden state vectors are weighted by an attention mechanism, and the comment is expressed as a one-dimensional feature vector.
4. The method for identifying false news based on news-comment relevance analysis according to claim 3, wherein all the recurrent neural networks in S1 are bidirectional long-short term memory networks.
5. A false news identification method based on news-comment relevance analysis according to any one of claims 1 to 4, wherein in S2, a gate loop unit is adopted to obtain context relevance feature vectors of all leaf nodes through recursive computation.
6. The method for identifying false news based on news-comment relevance analysis according to claim 5, wherein in the step S2, the feature vector construction method of each comment tree is as follows:
combining the feature vector of the current node with the hidden state vector of the father node of each comment tree from top to bottom based on a gate cycle unit, calculating reset gating used for retaining partial hidden state information of the father node and update gating used for adjusting the retention proportion of the hidden state information of the father node of the node, and calculating the hidden state vectors of all the nodes in the comment tree through recursive processing; and processing the hidden state vectors of all leaf nodes of the comment tree by using a pooling method to obtain the feature vector of the comment tree.
7. The method for identifying false news based on news-comment relevance analysis according to claim 6, wherein the reset gate r is usediThe calculation formula is as follows: r isi=σ(Wrci+Urhp(i)) Said update gating ziThe calculation formula is as follows: z is a radical ofi=σ(Wzci+Uzhp(i)) In the formula, Wr、WzAre all parameter matrices, Ur、UzAre all parameter vectors, σ is the activation function, hp(i)And hiding the state vector for the parent node of the ith node.
8. A false news identification method based on news-comment relevance analysis according to any one of claims 1 to 4, wherein the S3 includes:
matching the relevance between the news characteristic matrix and the comment characteristic matrix by adopting a collaborative attention network to construct a similarity matrix, wherein the comment characteristic matrix is formed by characteristic vectors of all comment trees;
using a similarity matrix to correlate the news characteristic matrix with the comment characteristic matrix so as to update the news characteristic matrix and the comment characteristic matrix, and obtaining a new news characteristic matrix fused with comment information and a new comment characteristic matrix fused with news information;
calculating to obtain a collaborative attention weight among news clauses based on the new news characteristic matrix, and calculating to obtain a collaborative attention weight among comment trees based on the new comment characteristic matrix;
weighting vectors corresponding to all text clauses in the news characteristic matrix before updating by adopting the cooperative attention weight among news clauses to obtain a news characteristic vector, and weighting the characteristic vectors of all comment trees in the comment characteristic matrix before updating by adopting the cooperative attention weight among comment trees to obtain a comment characteristic vector;
and fully connecting the news characteristic vector with the comment characteristic vector to judge the authenticity of the news.
9. The method for identifying false news based on news-comment relevance analysis according to claim 8, wherein the news feature matrix is updated according to the formula: hs=tanh(WsS+(WcC) F), the updating formula of the comment feature matrix is as follows: hc=tanh(WcC+(WsS)FT) In the formula, HsFor the updated new news feature matrix, HcFor a new comment feature matrix after updating, S is the news feature matrix before updating, C is the comment feature matrix before updating, F is a similarity matrix, W is a similarity matrixc、WsAre all parameter matrices.
10. A machine-readable storage medium having stored thereon machine-executable instructions which, when invoked and executed by a processor, cause the processor to implement a method of false news identification based on news-comment relevance analysis as claimed in any one of claims 1 to 9.
CN202010420460.5A 2020-05-18 2020-05-18 False news identification method based on news-comment relevance analysis Pending CN111639252A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010420460.5A CN111639252A (en) 2020-05-18 2020-05-18 False news identification method based on news-comment relevance analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010420460.5A CN111639252A (en) 2020-05-18 2020-05-18 False news identification method based on news-comment relevance analysis

Publications (1)

Publication Number Publication Date
CN111639252A true CN111639252A (en) 2020-09-08

Family

ID=72329621

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010420460.5A Pending CN111639252A (en) 2020-05-18 2020-05-18 False news identification method based on news-comment relevance analysis

Country Status (1)

Country Link
CN (1) CN111639252A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112241456A (en) * 2020-12-18 2021-01-19 成都晓多科技有限公司 False news prediction method based on relationship network and attention mechanism
CN112765313A (en) * 2020-12-31 2021-05-07 太原理工大学 False information detection method based on original text and comment information analysis algorithm
CN112819645A (en) * 2021-03-23 2021-05-18 大连民族大学 Social network false information propagation detection method based on motif degree
CN113032525A (en) * 2021-03-23 2021-06-25 深圳大学 False news detection method and device, electronic equipment and storage medium
CN113158082A (en) * 2021-05-13 2021-07-23 聂佼颖 Artificial intelligence-based media content reality degree analysis method
CN113177164A (en) * 2021-05-13 2021-07-27 聂佼颖 Multi-platform collaborative new media content monitoring and management system based on big data
CN113254864A (en) * 2021-04-29 2021-08-13 中国科学院计算技术研究所数字经济产业研究院 Dynamic subgraph generation method and dispute detection method based on node characteristics and reply path
CN113392334A (en) * 2021-06-29 2021-09-14 长沙理工大学 False comment detection method in cold start environment
CN114429403A (en) * 2020-10-14 2022-05-03 国际商业机器公司 Mediating between social network and payment curation content producers in false positive content mitigation
CN114840771A (en) * 2022-03-04 2022-08-02 北京中科睿鉴科技有限公司 False news detection method based on news environment information modeling
CN114841147A (en) * 2022-04-20 2022-08-02 中国人民武装警察部队工程大学 Rumor detection method and device based on multi-pointer cooperative attention
CN117332084A (en) * 2023-09-22 2024-01-02 北京远禾科技有限公司 Machine learning method suitable for detecting malicious comments and false news simultaneously

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180365562A1 (en) * 2017-06-20 2018-12-20 Battelle Memorial Institute Prediction of social media postings as trusted news or as types of suspicious news
CN110210016A (en) * 2019-04-25 2019-09-06 中国科学院计算技术研究所 Bilinearity neural network Deceptive news detection method and system based on style guidance
WO2019183191A1 (en) * 2018-03-22 2019-09-26 Michael Bronstein Method of news evaluation in social media networks
WO2020061578A1 (en) * 2018-09-21 2020-03-26 Arizona Board Of Regents On Behalf Of Arizona State University Method and apparatus for collecting, detecting and visualizing fake news

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180365562A1 (en) * 2017-06-20 2018-12-20 Battelle Memorial Institute Prediction of social media postings as trusted news or as types of suspicious news
WO2019183191A1 (en) * 2018-03-22 2019-09-26 Michael Bronstein Method of news evaluation in social media networks
WO2020061578A1 (en) * 2018-09-21 2020-03-26 Arizona Board Of Regents On Behalf Of Arizona State University Method and apparatus for collecting, detecting and visualizing fake news
CN110210016A (en) * 2019-04-25 2019-09-06 中国科学院计算技术研究所 Bilinearity neural network Deceptive news detection method and system based on style guidance

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JING MA ET AL.: "Rumor detection on Twitter with tree-structured recursive neural networks" *
KAI SHU ET AL.: "dEFEND: Explainable Fake News Detection" *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114429403A (en) * 2020-10-14 2022-05-03 国际商业机器公司 Mediating between social network and payment curation content producers in false positive content mitigation
CN112241456A (en) * 2020-12-18 2021-01-19 成都晓多科技有限公司 False news prediction method based on relationship network and attention mechanism
CN112765313A (en) * 2020-12-31 2021-05-07 太原理工大学 False information detection method based on original text and comment information analysis algorithm
CN112819645A (en) * 2021-03-23 2021-05-18 大连民族大学 Social network false information propagation detection method based on motif degree
CN113032525A (en) * 2021-03-23 2021-06-25 深圳大学 False news detection method and device, electronic equipment and storage medium
CN112819645B (en) * 2021-03-23 2024-03-29 大连民族大学 Social network false information propagation detection method based on degree of motif
CN113254864B (en) * 2021-04-29 2024-05-28 中科计算技术创新研究院 Dynamic subgraph generation method and dispute detection method based on node characteristics and reply paths
CN113254864A (en) * 2021-04-29 2021-08-13 中国科学院计算技术研究所数字经济产业研究院 Dynamic subgraph generation method and dispute detection method based on node characteristics and reply path
CN113158082A (en) * 2021-05-13 2021-07-23 聂佼颖 Artificial intelligence-based media content reality degree analysis method
CN113158082B (en) * 2021-05-13 2023-01-17 和鸿广科技(上海)有限公司 Artificial intelligence-based media content reality degree analysis method
CN113177164A (en) * 2021-05-13 2021-07-27 聂佼颖 Multi-platform collaborative new media content monitoring and management system based on big data
CN113392334B (en) * 2021-06-29 2024-03-08 长沙理工大学 False comment detection method in cold start environment
CN113392334A (en) * 2021-06-29 2021-09-14 长沙理工大学 False comment detection method in cold start environment
CN114840771A (en) * 2022-03-04 2022-08-02 北京中科睿鉴科技有限公司 False news detection method based on news environment information modeling
CN114841147A (en) * 2022-04-20 2022-08-02 中国人民武装警察部队工程大学 Rumor detection method and device based on multi-pointer cooperative attention
CN114841147B (en) * 2022-04-20 2024-04-19 中国人民武装警察部队工程大学 Rumor detection method and device based on multi-pointer cooperative attention
CN117332084A (en) * 2023-09-22 2024-01-02 北京远禾科技有限公司 Machine learning method suitable for detecting malicious comments and false news simultaneously
CN117332084B (en) * 2023-09-22 2024-05-03 北京远禾科技有限公司 Machine learning method suitable for detecting malicious comments and false news simultaneously

Similar Documents

Publication Publication Date Title
CN111639252A (en) False news identification method based on news-comment relevance analysis
US11775760B2 (en) Man-machine conversation method, electronic device, and computer-readable medium
CN109472024B (en) Text classification method based on bidirectional circulation attention neural network
CN110826336B (en) Emotion classification method, system, storage medium and equipment
CN110321563B (en) Text emotion analysis method based on hybrid supervision model
CN108932342A (en) A kind of method of semantic matches, the learning method of model and server
CN109460459B (en) Log learning-based dialogue system automatic optimization method
CN110222163A (en) A kind of intelligent answer method and system merging CNN and two-way LSTM
CN113505200B (en) Sentence-level Chinese event detection method combined with document key information
CN111274790B (en) Chapter-level event embedding method and device based on syntactic dependency graph
Vimali et al. A text based sentiment analysis model using bi-directional lstm networks
CN113360582B (en) Relation classification method and system based on BERT model fusion multi-entity information
CN112131345B (en) Text quality recognition method, device, equipment and storage medium
CN113988075A (en) Network security field text data entity relation extraction method based on multi-task learning
CN114707516A (en) Long text semantic similarity calculation method based on contrast learning
CN117094291A (en) Automatic news generation system based on intelligent writing
CN112632252A (en) Dialogue response method, dialogue response device, computer equipment and storage medium
Huo et al. TERG: topic-aware emotional response generation for chatbot
Xu et al. Building a natural language query and control interface for IoT platforms
CN112417170B (en) Relationship linking method for incomplete knowledge graph
CN117828024A (en) Plug-in retrieval method, device, storage medium and equipment
CN110377753B (en) Relation extraction method and device based on relation trigger word and GRU model
Chen et al. Question answering over knowledgebase with attention-based LSTM networks and knowledge embeddings
CN116361438A (en) Question-answering method and system based on text-knowledge expansion graph collaborative reasoning network
CN116108840A (en) Text fine granularity emotion analysis method, system, medium and computing device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination