CN112650851A

CN112650851A - False news identification system and method based on multilevel interactive evidence generation

Info

Publication number: CN112650851A
Application number: CN202011587811.8A
Authority: CN
Inventors: 饶元; 吴连伟; 孙菱; 郝哲; 贺王卜; 兰玉乾
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2020-12-28
Filing date: 2020-12-28
Publication date: 2021-04-13
Anticipated expiration: 2040-12-28
Also published as: CN112650851B

Abstract

The invention discloses a false news identification system and method based on multilevel interactive evidence generation, which are generated by designing two progressive coding and decoding levels to generate a true phase behind the false news as an explanation of a verification result. The inference generation of the invention utilizes local inference to promote the false part of news and deep understanding between conflicts, so as to focus on how to reveal the real false part behind the false news; the invention has detachability, can decouple, train and utilize the three generating modules of the invention, and has the model generalization capability and the task stage training capability; experiments on two published, widely used, fake news data sets have shown that the present invention achieves better performance than the most advanced methods before.

Description

False news identification system and method based on multilevel interactive evidence generation

Technical Field

The invention relates to a false news identification system and method with interpretability and based on multi-level interactive refined evidence generation.

Background

Currently, social media have become an indispensable part of people's life, and people can freely express themselves in social media, draw knowledge and interact. The social network not only brings 'group intelligence' by virtue of the speaking convenience and the low cost of information publishing, but also causes the diffusion and the flooding of a large amount of false or unverified information, and particularly in the presence of a great extreme sudden event, the false information diffusion is easily caused, the life order of people is disturbed, and the social panic is caused. The abuse of fake news seriously affects the life, social stability and national safety of people. How to quickly identify the credibility of information in a social network and enable the identification result to be interpretable for users has become one of the major problems in the academic world and the industrial world at present.

The application of data mining and machine learning has led to the development of identification research of fake news. The classical method is to extract text features (such as N-gram features and bag of words features) by the content of fake news and to identify the authenticity of information by using supervised learning algorithms (such as random forest and support vector machine). NLP researchers have also focused on deeper linguistic features such as mining of factual/affirmative verbs and subjective words and writing styles. Although these methods have achieved some false news detection performance, they have difficulty in providing a reasonable interpretation of the detection results to the user. To overcome these drawbacks, recent research tends to explore a false news detection method with interpretability, which mainly explains the false part of false news by developing an interaction model to capture evidence segments from reliable sources, and often focuses on word-level saliency evidence semantics and sentence-level consistency semantics to embody the interpretability of false news. However, although these interaction models have reflected some degree of interpretability, the word-level and sentence-level evidence they capture may simply be conflicts between news and related articles that are difficult to interpret behind fake news. In other words, the current interaction model captures a variety of coarse-grained conflicts in related articles, and the truth behind the false news may need to be refined continuously in these conflicts to obtain.

Disclosure of Invention

The invention aims to solve the problems in the prior art and provides a false news identification system and method based on multi-level interactive evidence generation. The invention not only improves the identification performance of the fake news, but also provides reasonable and transparent interpretable evidence for the identification result.

In order to achieve the purpose, the invention adopts the following technical scheme to realize the purpose:

the false news identification method based on multilevel interactive evidence generation comprises the following steps:

step 1, taking a news sequence C and a related article sequence R as input characteristics;

step 2: for any news sequence C and related article sequence R, learning the dependency relationship between any two words and the structural characteristics in the sequences by adopting a self-attention network as an encoder of a conflict generator and a false part generator;

and step 3: linearly projecting the query, key and value of a news sequence C or a related article sequence R for h times by means of different linear projections, and then parallelly executing zooming point-by-point attention; the attention results are concatenated and projected to obtain a new representation as follows:

H＝MultiHead(Q,K,V)＝Concat(head₁,head₂,…,head_h)W^o (2)

wherein,

and W^oIs a trainable parameter; h^CAnd H^RAre two outputs of the spurious part generator module;

outputs for the conflict generator for the first, ith, and last related articles;

step 4, the cross attention network formed by the self attention network makes the outputs of the encoders of the conflict generator and the false part generator mutually interact as the input of the decoder, which is concretely as follows:

H_claim＝attention(Q,K,V)＝attention(H^R,H^C,H^C) (4)

H_allRA＝attention(Q,K,V)＝attention(H^c,H^R,H^R) (5)

wherein H_claimAnd H_allRARepresenting the output of the cross-attention tier for news and for related articles, respectively;

step 6: using linear interpolation as a fusion function

Obtaining:

wherein λ is a hyper-parameter for controlling how much information content of other tasks should be considered absorbed, 0< λ < 1;

step 7, applying a feedforward network to the fused result, wherein the feedforward network is added with nonlinear characteristics and scale invariant characteristics, and comprises an implicit layer with a ReLU;

wherein, W₁、W₂、b₁And b₂For trainable parameters, O^FIs a long-context attention representation of the decoder;

step 8, acquiring the word probability of the generation process by using a softmax layer; correspondingly generated error part sequence

The log-likelihood estimate of (d) is expressed as:

step 9, the false part generator module generates a contextual attention representation O based on the feed-forward network^FPredicted word y of_tExpressed as:

P(y_t|C,y₁,y₂,…,y_t-1；θ)＝P(y_t-1|O^F；θ)＝softmax(W_sO^F) (9)

wherein, W_sIs a trainable parameter;

step 10, in the cross attention layer,

representing the interaction of the relevant article and the ith article;

in the fusion layer, the interactions of all related articles are fused, namely:

wherein λ is₁+λ₂+…+λ_n＝1；

At the feedback network layer, the output of the conflict generation module is a conflict sequence O^CThe sequence generated by the false part generator module is Y^C；

Step 11, capturing the generated sequence Y by using a local reasoning unit^FAnd Y^CAnd incorporate it into a Y-based basis^CY of (A) is^FIn the new representation of (a);

firstly, a common attention moment array is calculated

To capture the correlation between two sequences, each element E in a common attention matrix_i,jRepresents Y^FSequence ith word and Y^CThe correlation between the jth word of the sequence; the common attention matrix is:

wherein W and P represent trainable parameters, an element dot-product operation;

for Y^FY of (A) is^CDirected attention vector:

fusing original vectors using absolute differences and element dot multiplication

And

to obtain a catalyst containing Y^FBy Y^CNew representation of reasoning information for guidance:

where LayerNorm (-) is layer regularization, the result is

Is a 2-dimensional and Y^FSimilar shaped tensions;

step 11, obtaining the generated inference sequence Y through the generation process^EThe reason for the error of the fake news can be explained by the reasoning sequence because the generated reasoning sequence can reason the fake part of the news and corresponding evidence;

and 12, integrating the three sequences according to different proportions to absorb the context expression to obtain a positive characteristic F:

F＝e(Y^E)+γ₁e(Y^F)+γ₂e(Y^C) (17)

wherein e (-) is a representation of a sequence of words, γ₁And gamma₂Is a hyper-parameter;

step 13, based on the integration characteristic F, a multi-layer perceptron MLP classifier is used for predicting distributed labels, probability distribution prediction task learning of a softmax function is adopted, and a real training sample label y is used for minimizing the error of a global loss function model:

v＝ReLU(W_fF+b_f) (18)

p＝softmax(W_pF+b_p) (19)

loss＝-∑ylogp (20)

wherein, W_p、W_f、b_fAnd b_pAre trainable parameters.

A false news identification system based on multi-level interactive evidence generation, comprising:

an encoding module for capturing context representations from an input sequence of a generative model, learning and encoding dependencies between the input sequence and internal structural features;

the interactive learning decoding module is used for exploring a part which is possible to generate errors in the fake news and conflict semantics existing between related articles;

an interpretable evidence generating module for generating an inference sequence as an interpretation sequence of a false news cause of error;

and the task learning module is used for integrating the three generation sequences to enhance the identification performance of the false news.

A false news identification terminal device based on multi-level interactive evidence generation comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the computer program to realize the steps of the method.

A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method as described above.

Compared with the prior art, the invention has the following beneficial effects:

the invention discloses a virtual and fake news identification system and method with interpretability and based on multi-level interactive refined evidence generation.

The invention designs two progressive coding and decoding levels to generate the true phase behind the false news as the explanation of the verification result. The inference generation of the invention utilizes local inference to promote the false part of news and deep understanding between the conflict, in order to focus on how to reveal the true false part behind the false news; the method has detachability, can be used for decoupling training of three generating modules, and has model generalization capability and task stage training capability; experiments on two published, widely used, fake news data sets have shown that the present invention achieves better performance than the most advanced methods before.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings may be obtained according to these drawings without inventive effort.

FIG. 1 is an architectural diagram of the present invention;

FIG. 2 is a graph of performance of the experiments of the present invention under two data sets, Snaps and PolitiFact;

FIG. 3 is a graph showing the separation performance of the module assembly of the present invention under two data sets of snoops and PolitiFact.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

In the description of the embodiments of the present invention, it should be noted that if the terms "upper", "lower", "horizontal", "inner", etc. are used for indicating the orientation or positional relationship based on the orientation or positional relationship shown in the drawings or the orientation or positional relationship which is usually arranged when the product of the present invention is used, the description is merely for convenience and simplicity, and the indication or suggestion that the referred device or element must have a specific orientation, be constructed and operated in a specific orientation, and thus, cannot be construed as limiting the present invention. Furthermore, the terms "first," "second," and the like are used merely to distinguish one description from another, and are not to be construed as indicating or implying relative importance.

Furthermore, the term "horizontal", if present, does not mean that the component is required to be absolutely horizontal, but may be slightly inclined. For example, "horizontal" merely means that the direction is more horizontal than "vertical" and does not mean that the structure must be perfectly horizontal, but may be slightly inclined.

In the description of the embodiments of the present invention, it should be further noted that unless otherwise explicitly stated or limited, the terms "disposed," "mounted," "connected," and "connected" should be interpreted broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be connected internally or indirectly. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to specific situations.

The invention is described in further detail below with reference to the accompanying drawings:

referring to fig. 1, an embodiment of the present invention discloses a false news recognition system based on multi-level interactive evidence generation, including:

and the coding module takes fake news and a series of related articles as input of the generative model, and learns and codes the dependency and internal structural features between input sequences by adopting a self-attention model in order to capture context representation from the input sequences of the generative model. In particular, the first two generative models of the present invention have the same self-attention network as the encoder structure.

The interactive learning decoding module develops an interactive learning model to enable news to interact with related articles and interact among the related articles, so that parts of false news, where errors may occur, and conflict semantics existing among related articles are explored respectively.

The interpretable evidence generating module provides a local inference network on the basis of a conventional decoder to enable the error part and conflict semantics of the false news obtained in the module 2 to realize a global reasoning process, so that a refined inference sequence is generated to serve as an explanation sequence of error reasons of the false news.

And the task learning module integrates the three generation sequences in a linear combination mode to enhance the identification performance of the fake news.

The embodiment of the invention discloses a false news identification method based on multi-level interactive evidence generation, which comprises the following steps:

stage 0: data initialization

Step 0: giving a news sequenceColumn C ═ C₁,c₂,…,c_|C|In which c is_iAn embedded sequence representing the ith word, and a series of related article sequences R<r₁；r₂；…；r_|R|>(ii) a Wherein r is_iThe i-th related article, which represents the composition, "; "denotes a splicing operation, and

an embedded representation representing a kth word in an ith related chapter; in addition, | C |, | R |, and | R |_iAnd | represents the word length of the news sequence, the number of related articles, and the word length of the ith related article, respectively. y represents a true or false binary label;

stage 1: construction of the encoder

Step 1: taking the news sequence and the related article sequence as input characteristics of the model;

step 2: for the context representation of the model input features, the invention adopts a self-attention network as an encoder of two generators to implicitly learn the dependency relationship between any two words and the structural features inside the sequence, taking a false part generator as an example, the details of the encoder can be expressed as follows:

wherein Q, K, V are the query matrix, the key matrix, and the value matrix, respectively. d is the dimension of the key matrix. In the configuration of the present embodiment, Q ═ K ═ V ═ C is for the module of the news sequence, and Q ═ K ═ V ═ R is for all the relevant article sequence modules. In the encoder of the collision generator, Q-K-V-r_iEncoding for the ith related article.

And step 3: to enhance the parallelism from attention to boost the efficiency of the model, multi-headed attention first linearly projects the query, key and value h times by means of different linear projections, and then performs scaled point-by-point attention in parallel. Finally, the results of these attentions are concatenated and projected to obtain a new representation. The process can be formulated as:

H＝MultiHead(Q,K,V)＝Concat(head₁,head₂,…,head_h)W^o (3)

wherein,

and W^oAre trainable parameters. In particular, H^CAnd H^RAre the two outputs of the dummy portion generator module.

The conflict generator is directed to the output of the first, ith and last related articles.

And (2) stage: construction of an interactive learning decoder

And 4, step 4: in order to explore the possible wrong parts in the news to be verified, interactive learning decoders are designed to interact with the relevant articles. The interaction module involves three levels: a cross-attention layer, a fusion layer, and a feed-forward network layer.

And 5: in order to make the interaction between the news to be verified and the related articles more sufficient, a cross attention network consisting of a self attention network makes the outputs of the two encoders interact with each other as the input of the decoder. The interaction process can be described as follows:

H_claim＝attention(Q,K,V)＝attention(H^R,H^C,H^C) (4)

H_allRA＝attention(Q,K,V)＝attention(H^c,H^R,H^R) (5)

wherein H_claimAnd H_allRAThe output of the cross-attention tier for news and for related articles is shown separately.

Step 6: to fuse news into related articles and to more importantly absorb high-level representations in news semantics, linear interpolation is used as a fusion function, which can be calculated as:

where λ (0< λ <1) is a hyper-parameter to control how much information amount of other tasks should be considered absorbed.

And 7: next, a feed forward network is applied to the fused results, which adds non-linear features and scale invariant features, including an implicit layer with ReLU.

Wherein, W₁、W₂、b₁And b₂For trainable parameters, O^FIs a long-context attention representation of the decoder.

And 8: finally, the probability of the word of the generation process is obtained by utilizing the softmax layer. Formally, correspondingly generated error partial sequences

The log-likelihood estimate of (d) may be expressed as:

and step 9: the error portion generation module generates a context representation O based on a feed-forward network^FPredicting word y_tCan be expressed as:

P(y_t|C,y₁,y₂,…,y_t-1；θ)＝P(y_t-1|O^F；θ)＝softmax(W_sO^F) (9)

wherein, W_sAre trainable parameters.

Step 10: in particular, the decoder and error part of the collision generating moduleThe decoders of the generation module are similar to interactive learning decoders that allow all relevant articles to interact with each relevant article, thereby capturing suspicious or conflicting semantics from the relevant articles. At the cross-attention level(s),

representing the interaction of the relevant article with the ith article. In the fusion layer, the interactions of all related articles are fused, i.e.

Wherein λ₁+λ₂+…+ λ_n1. At the feedback network layer, the output of the conflict generation module is a conflict sequence O^CThe sequence generated by the module is Y^C。

And (3) stage: generation of interpretable evidence

Step 11: in order to find the truth behind the false news, the present embodiment proposes to perform inference generation by means of a local inference unit, thereby implementing a general inference process. Local inference unit captures the generated sequence Y^FAnd Y^CAnd incorporate it into the Y-based base^CY of (A) is^FIn the new representation of (2). Specifically, a common attention moment matrix is first calculated

To capture the correlation between two sequences, each element E in a common attention matrix_i,jRepresents Y^FSequence ith word and Y^CCorrelation between the j-th words of the sequence. Formally, the common attention moment array can be calculated as:

where W and P represent trainable parameters, and an element dot product operation.

Step 12: to obtain a radical of formula Y^FY of (A) is^CDirected attention vector:

step 13: to more fully integrate Y^FAnd Y^CFusing the original vector by absolute difference and element dot multiplication

And

step 14: obtain a catalyst containing Y^FBy Y^CNew representation of reasoning information for guidance:

where LayerNorm (-) is layer regularization, the result is

Is a 2-dimensional and Y^FSimilar shaped tensions.

Step 15: the generated inference sequence Y is obtained through the generation process^E(step 8 and step 9). The reason for the error of the fake news can be explained by the reasoning sequence because the reasoning sequence generated can reason out the fake part of the news and corresponding evidence.

And (4) stage: task learning

Step 16: in order to fully utilize the generated three sequences to improve the performance of false news identification, the three sequences are integrated to absorb context expressions according to different proportions.

F＝e(Y^E)+γ₁e(Y^F)+γ₂e(Y^C) (17)

Wherein e (-) is a representation of a sequence of words, γ₁And gamma₂Is a hyper-parameter.

And step 17: based on the integration characteristic F, a multi-layer perceptron (MLP) classifier is used for predicting the distributed labels, probability distribution prediction task learning of a softmax function is adopted, and a real training sample label y is used for minimizing the error of a global loss function model:

v＝ReLU(W_fF+b_f) (18)

p＝softmax(W_pF+b_p) (19)

loss＝-∑ylogp (20)

wherein, W_p、W_f、b_fAnd b_pAre trainable parameters.

The device provided by the embodiment of the invention. The embodiment comprises the following steps: a processor, a memory, and a computer program stored in the memory and executable on the processor. The processor realizes the steps of the above-mentioned method embodiments when executing the computer program. Alternatively, the processor implements the functions of each module/unit in the above device embodiments when executing the computer program.

The computer program may be partitioned into one or more modules/units that are stored in the memory and executed by the processor to implement the invention.

The terminal equipment can be computing equipment such as a desktop computer, a notebook computer, a palm computer and a cloud server. The terminal device may include, but is not limited to, a processor, a memory.

The processor may be a Central Processing Unit (CPU), other general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, etc.

The memory can be used for storing the computer program and/or the module, and the processor can realize various functions of the terminal equipment by running or executing the computer program and/or the module stored in the memory and calling data stored in the memory.

The terminal device integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer memory, Read-only memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, etc. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease in accordance with the requirements of legislative and patent practice in a jurisdiction, for example in some jurisdictions, in accordance with legislative and patent practice, the computer readable medium does not include electrical carrier signals and telecommunications signals.

The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and it is obvious to those skilled in the art that various modifications and variations can be made in the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The false news identification method based on multilevel interactive evidence generation is characterized by comprising the following steps:

head_i＝Attention(QW_i ^Q,KW_i ^K,VW_i ^W) (1)

H＝MultiHead(Q,K,V)＝Concat(head₁,head₂,…,head_h)W^o (2)

wherein, W_i ^Q、W_i ^K、W_i ^WAnd W^oIs a trainable parameter; h^CAnd H^RAre two outputs of the spurious part generator module;

output for the conflict generator for the first, ith, and last related articles;

H_claim＝attention(Q,K,V)＝attention(H^R,H^C,H^C) (4)

H_allRA＝attention(Q,K,V)＝attention(H^c,H^R,H^R) (5)

step 6: using linear interpolation as a fusion function

Obtaining:

step (ii) of8, acquiring the word probability of the generation process by utilizing a softmax layer; correspondingly generated error part sequence

The log-likelihood estimate of (d) is expressed as:

P(y_t|C,y₁,y₂,…,y_t-1；θ)＝P(y_t-1|O^F；θ)＝softmax(W_sO^F) (9)

wherein, W_sIs a trainable parameter;

step 10, in the cross attention layer,

representing the interaction of the relevant article and the ith article;

wherein λ is₁+λ₂+…+λ_n＝1；

Step 11, capturing the generated sequence Y by using a local reasoning unit^FAnd Y^CAnd incorporate it into the Y-based base^CY of (A) is^FIn the new representation of (a);

first calculating a common attentionMatrix array

for Y^FY of (A) is^CDirected attention vector:

fusing original vector Y by absolute difference and element dot product_i ^FAnd

where LayerNorm (-) is layer regularization, the result is

Is a 2-dimensional and Y^FTensors of similar shape;

F＝e(Y^E)+γ₁e(Y^F)+γ₂e(Y^C) (17)

step 13, based on the integration characteristic F, a multi-layer perceptron MLP classifier is used for predicting distributed labels, probability distribution prediction task learning of a softmax function is adopted, and a real training sample label y is utilized to minimize the global loss function model error:

v＝ReLU(W_fF+b_f) (18)

p＝softmax(W_pF+b_p) (19)

loss＝-∑ylogp (20)

wherein, W_p、W_f、b_fAnd b_pAre trainable parameters.

2. A false news recognition system based on multi-level interactive evidence generation, comprising:

an interpretable evidence generating module for generating an inference sequence as an interpretation sequence of causes of error for fake news;

and the task learning module is used for integrating the three generation sequences to enhance the identification performance of the fake news.

3. False news recognition terminal device based on multi-level interactive evidence generation, comprising a memory, a processor and a computer program stored in said memory and executable on said processor, characterized in that said processor, when executing said computer program, implements the steps of the method according to claim 1.

4. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method as set forth in claim 1.