CN114067381A - Deep forgery identification method and device based on multi-feature fusion - Google Patents

Deep forgery identification method and device based on multi-feature fusion Download PDF

Info

Publication number
CN114067381A
CN114067381A CN202110473432.4A CN202110473432A CN114067381A CN 114067381 A CN114067381 A CN 114067381A CN 202110473432 A CN202110473432 A CN 202110473432A CN 114067381 A CN114067381 A CN 114067381A
Authority
CN
China
Prior art keywords
srm
learnable
input stream
video
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110473432.4A
Other languages
Chinese (zh)
Inventor
操晓春
韩冰
韩晓光
张华�
李京知
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Shenzhen Research Institute of Big Data SRIBD
Original Assignee
Institute of Information Engineering of CAS
Shenzhen Research Institute of Big Data SRIBD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS, Shenzhen Research Institute of Big Data SRIBD filed Critical Institute of Information Engineering of CAS
Priority to CN202110473432.4A priority Critical patent/CN114067381A/en
Publication of CN114067381A publication Critical patent/CN114067381A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a deep forgery identification method and device based on multi-feature fusion. The method mainly comprises the following steps: (1) carrying out segmented frame extraction on a video and carrying out face alignment pretreatment; (2) processing a video frame by adopting an RGB input stream and a learnable SRM input stream; (3) performing feature extraction on video frames by using RGB input streams and performing inter-frame fusion; (4) removing the non-conductive part of the classic SRM algorithm from the learnable SRM input stream, replacing the hyper-parameter q with 30 learnable matrixes of 5 x 5, and initializing; (5) converting the SRM filters with 30 preset parameters in the classic SRM algorithm into learnable SRM convolution kernels and inserting the learnable SRM convolution kernels into the identification network in the step (3) to form a learnable SRM network; and finally, fusing the outputs of the RGB stream and the learnable SRM stream to obtain a final recognition result. The invention can effectively improve the depth forgery identification effect on the low-definition video.

Description

Deep forgery identification method and device based on multi-feature fusion
Technical Field
The invention belongs to the field of computer vision depth forgery identification, and particularly relates to a depth forgery identification method and device based on multi-feature fusion.
Background
The term deep forgery is derived from a face changing software named as deep fakes, and is then extended to refer to all AI face changing technologies realized by computer graphics or deep learning technologies. The abuse of deep-forgery technology has brought many negative impacts to society in recent years, for effective video of deep forgery. The general flow of depth forgery identification is that firstly, the face detection is carried out on a given depth forgery video, then the feature extraction is carried out on the extracted face, and finally whether the given video is depth forgery or not is judged according to the extracted feature.
Among the currently common Deep forgery Recognition algorithms are FWA (Y.Li and S.Lyu, "expanding detection video by detecting detection face warns artifacts," in IEEE Conference on Computer Vision and Pattern Recognition works, CVPR works Computer Vision Foundation/IEEE,2019, pp.46-52.), and Xceptation (F.Chollet, "Xceptation: Deep forgery with depth detection possibility solutions," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2017, pp.1251-1258). The FWA mainly detects the splicing trace generated when the real face is replaced by the fake face in the last step of the generation of the depth fake video; and the Xconcept detects the forged trace generated in the whole depth forging generation process.
One difficulty of deep forgery identification is that when the definition of a forged video is low, forged traces such as splicing traces of human face edges, inconsistency between video frames, generation traces of forged human faces and the like are more difficult to find, which greatly increases the difficulty of deep forgery identification. The current depth counterfeiting method cannot achieve good effect on low-definition video.
Disclosure of Invention
The invention mainly solves the technical problem of providing a depth forgery identification method and a depth forgery identification device, and can solve the problem that the existing identification method has poor effect on low-definition depth forgery video.
In order to solve the technical problem, the invention provides a deep forgery identification method based on multi-feature fusion, which comprises the following steps:
averagely dividing an input video into a plurality of video segments, randomly sampling a plurality of video frames for each video segment, and performing face detection and face alignment on each selected video frame to serve as an input video frame;
respectively processing an input video frame by adopting an RGB input stream and a learnable SRM input stream, wherein the RGB input stream extracts semantic features of suspicious forged parts in the video frame and obtains a prediction result of deep forgery recognition according to the semantic features, and the learnable SRM input stream fits noise features of the suspicious forged parts in the video frame and obtains a prediction result of the deep forgery recognition according to the noise features;
and fusing the prediction result of the RGB input stream and the prediction result of the learnable SRM input stream to obtain a final deep forgery recognition result.
Further, a plurality of video frames are randomly sampled for each video segment, and the png format is used as much as possible when the video frames are extracted, so that the influence of picture compression on tampering marks is reduced.
Further, the RGB input stream extracts semantic features of suspected counterfeit portions in the video frame and obtains a prediction result of deep counterfeit recognition according to the semantic features, including:
for the RGB input stream, respectively carrying out feature extraction on each video frame after face alignment by using an Xtrapping network, extracting the semantic features of suspicious forged parts in each video frame by using the Xtrapping network, finally averaging all the extracted features and activating by using a Softmax function to obtain the output of the RGB input stream, wherein the Xtrapping network in the whole process shares parameters.
Further, the fitting of the learnable SRM input stream to the noise characteristics of the suspected counterfeit parts in the video frame and obtaining the prediction result of the deep counterfeit recognition according to the noise characteristics includes:
for the learnable SRM input stream, firstly removing the non-conductive parts, namely round function and truncate function, in the classic SRM algorithm, then replacing the hyperparameter q with 30 learnable matrixes of 5 x 5 to correspond to 30 SRM filters in the classic SRM algorithm, and respectively initializing the learnable matrixes into the maximum absolute values of all the elements which are equal to each other and are equal to the elements in the corresponding SRM filters;
dividing 30 SRM filters with corresponding learnable matrixes to obtain learnable matrixes with the dimensionality of 30 x 5, expanding the matrixes into SRM convolution kernels with the dimensionality of 30 x 3 x 5, inserting the SRM convolution kernels into an original Xception network architecture as a first layer of a neural network, and finely adjusting the first layer of the original Xception network to form the learnable SRM network;
for the learnable SRM input stream, respectively extracting the features of each video frame after the face alignment by using a learnable SRM network, fitting and analyzing the noise features of the suspicious tampered parts in the K T frames by using the learnable SRM network, and finally averaging all the extracted features and activating the extracted features by using a Softmax function to obtain the output of the learnable SRM input stream; the SRM network sharing parameters can be learned in the whole process.
Further, in the RGB input stream and the learnable SRM input stream, the intra-stream networks share parameters, while the inter-stream networks are trained independently.
Furthermore, by setting a learnable matrix, the essential characteristics of the SRM filter are not damaged in the network training process; reserving 0-value elements in 30 preset SRM filters by adopting a learnable matrix so as to reserve the characteristics of noise information calculated by the SRM filters; by initializing the learnable matrix, it is guaranteed that all parameters of the SRM convolution kernel are initialized to values belonging to [ -1,1 ].
Further, the learnable matrix of 30 x 5 is extended to convolution kernels of 30 x 3 x 5 in such a way that the matrix of 5 x 5 is replicated in the second dimension to exactly 3 equal parts.
Based on the same inventive concept, the invention also provides a deep forgery identification device based on multi-feature fusion by adopting the method, which comprises the following steps:
the preprocessing module is used for averagely dividing the input video into a plurality of video segments, randomly sampling a plurality of video frames for each video segment, and performing face detection and face alignment on each selected video frame to serve as an input video frame;
the double input stream processing module is used for respectively processing an input video frame by adopting an RGB input stream and a learnable SRM input stream, wherein the RGB input stream extracts semantic features of suspicious forged parts in the video frame and obtains a prediction result of deep forgery recognition according to the semantic features, and the learnable SRM input stream fits noise features of the suspicious forged parts in the video frame and obtains a prediction result of the deep forgery recognition according to the noise features;
and the fusion module is used for fusing the prediction result of the RGB input stream and the prediction result of the learnable SRM input stream to obtain a final deep forgery recognition result.
The invention has the characteristics and beneficial effects that:
the invention adopts a network based on multi-feature fusion to identify the depth-forged video, can simultaneously fit forged traces of the input video on semantic features and noise features, and effectively improves the effect of the existing depth-forged identification method on the low-definition video.
Drawings
FIG. 1: network framework architecture diagram.
FIG. 2: and visualizing the result by using a plurality of SRM flow calculation modes.
Detailed Description
The invention provides a depth forgery identification method based on multi-feature fusion, aiming at the problem that the existing depth forgery algorithm has unsatisfactory effect when processing low-definition depth forgery video, and the overall frame structure of the method is shown in figure 1. Experiments are performed below to illustrate the effectiveness of the present invention.
The experimental data adopts the lowest definition version of a faceforces + + deep counterfeiting data set, 1000 sections of real videos are provided in total, each section of real video is provided with 3 corresponding counterfeiting videos generated by the methods of Deepfakes, faceSwap and Face2Face respectively, namely 3000 counterfeiting videos are provided in total.
The experimental procedure was as follows:
(1) first, the input video V is divided into K segments { V }1,v2,…,vKV for each segment of videoiRandomly sampling T frames
Figure BDA0003046412190000041
And finally, carrying out face detection and face alignment on the selected K T frame by using Dlib (A.Rossler, D.Cozzolino, L.Verdoliva, C.Riess, J.Thies, and M.Nie beta ner, facial features: A large-scale video database for finger detection in human faces, 2018):
Ik=A(vk)
Ikrepresenting an input to be fed into the identification network; k is an element of [1, K ]]Indexing K video segments, each video segment comprising T frames; a represents a face alignment operation.
(2) Input video frames are processed separately using RGB input streams and learnable SRM input streams based on the classical SRM algorithm (J.Fridrich and J.Kodovsky, "Rich models for statistical analysis of digital images," IEEE Transactions on Information dynamics and Security, vol.7, No.3, pp.868-882,2012.). The RGB input stream uses the aligned human face as input, and aims to extract semantic features in the human face video frame, while the learnable SRM input stream uses a noise map obtained by processing the human face by an SRM filter as input, and mainly focuses on fitting the noise features in the human face video frame:
Figure BDA0003046412190000042
Figure BDA0003046412190000043
Figure BDA0003046412190000044
an input for the k-th video segment in the RGB stream;
Figure BDA0003046412190000045
an input for a k-th video segment in a learnable SRM stream; s represents the learnable SRM filter operation.
(3) And for the RGB input stream, respectively extracting the features of the K × T frames after the face alignment by using an Xscene network. And the Xception network extracts the semantic features of the suspicious forged regions from the K x T frames, averages all the extracted features and activates the features through a Softmax function to obtain the output of the RGB stream, namely the segment fusion of the RGB stream is realized. The Xmeeting network sharing parameters in the whole process are as follows:
Figure BDA0003046412190000046
PR=σ(FR)
FRfeatures of an RGB stream; avg is an averaging operation; wRNetwork parameters for RGB streams; an example is a convolution operation; σ is Softmax operation; pRIs a prediction vector for the RGB stream.
(4) For the learnable SRM input stream, because the classic SRM algorithm cannot achieve a good effect on the deep forgery task, the learnable SRM filter is introduced to better fit the face data. To achieve learnability, the non-conductible parts in the classical SRM algorithm, i.e., the round function and the truncate function, are first removed. The truncate function is used in the classic SRM algorithm mainly for computing the co-occurrence matrix, and the task does not need to compute the co-occurrence matrix; the round function is then no longer important after introducing the learning, so the choice is to eliminate these two non-derivable parts and to implement a learnable SRM filter. In order to carry out certain constraint on the learning process of the SRM filter while introducing the learnability, a learnable matrix Q is introduced to replace a hyper-parameter Q, and the values of 30 filters in the classic SRM algorithm are kept unchanged. The hyperparameter Q is replaced with 30 learnable matrices Q of 5 x 5 to correspond to 30 SRM filters in the classical SRM algorithm and is initialized to the maximum absolute value of all elements equal and equal to the elements in its corresponding SRM filter, respectively.
The purpose of setting the learnable matrix in step (4) is to ensure that the network training process does not damage the essential features of the SRM filter. The learnable matrix can be used for reserving 0-value elements in the preset 30 SRM filters, so that the characteristic of calculating noise information is reserved.
The reason for choosing the way in which the learnable matrix is initialized in this step (4) is to ensure that all parameters of the SRM convolution kernel are initialized to values belonging to [ -1,1 ].
(5) And (4) dividing the 30 SRM filters by the learnable matrix corresponding to the SRM filters to obtain the learnable matrix with the dimension of 30 x 5. Then, the matrix is expanded into an SRM convolution kernel of 30 × 3 × 5, and the SRM convolution kernel is used as a first layer of a neural network and inserted into an original Xception network architecture to form a learnable SRM network:
Figure BDA0003046412190000051
r is the output of the SRM filter; x is an input of the SRM input stream; an example is a convolution operation; w is the filter matrix in the classical SRM algorithm. A learnable SRM filter is implemented by converting the classical SRM algorithm into a convolution operation and removing the non-differentiable part.
Wherein the learnable matrix of 30 x 5 is extended to a convolution kernel of 30 x 3 x 5 in such a way that the matrix of 5 x 5 is replicated in the second dimension to exactly 3 equal parts.
(6) Processing the learnable SRM input stream in a similar way to the way of (3), replacing the original Xconcept network in (3) with the learnable SRM network obtained in (5) to fit and analyze the noise characteristics of the suspected forged part in the K T frame, thereby obtaining the learnable SRM input streamMeasurement result PsNamely, the segment fusion of the SRM stream is realized. The shared parameters of the learnable SRM network in the whole process are:
Figure BDA0003046412190000052
PS=σ(Fs)
FSis a feature of a learnable SRM stream; avg is an averaging operation; wSNetwork parameters for RGB streams; an example is a convolution operation; σ is Softmax operation; s is a prediction vector of the RGB stream.
(7) And (3) fusing the outputs of (3) and (6) by using a learnable linear function to obtain a final prediction result:
P=H(PR,PS)
p is the final prediction result (i.e. the depth forgery identification result of the video), PRAnd PSThe prediction results for the RGB stream and the learnable SRM stream, respectively, H is a linear function.
The following tests were carried out. The evaluation index adopts the identification accuracy:
Figure BDA0003046412190000053
wherein TP is a true positive case and FP is a false positive case. FN is false negative and TN is true negative.
For data, videos with numbers 1-720 (including 720 real videos and 2160 fake videos) in faceforces + + are selected as a training set, videos with numbers 721 and 960 are selected as a verification set, and videos with numbers 961 and 1000 are selected as a test set.
120 training runs were performed according to the experimental method mentioned above, and the one that performed the best on the validation set was selected as the model to be tested. The model was used to perform the test on the test set, and the final calculated result was 90.36% accuracy.
In order to prove the effectiveness of the method, two comparison experiments are needed, wherein the first comparison experiment is used for verifying whether the learnability in the SRM flow can improve the model accuracy. The method also imposes certain constraint on the learning process of the SRM filter when introducing the learnability, so that three conditions need to be compared: non-learnable, unconstrained learnable, and constrained learnable. The experiment uses Xception as a feature extraction network and only references the accuracy of the SRM stream. The final calculation result was that the unlearned SRM flow accuracy was 78.21%, the unconstrained SRM flow accuracy was 85.71%, and the constrained SRM flow accuracy was 90.00%.
The second comparison experiment is to verify whether the identification accuracy of the network can be improved by using multi-feature fusion, and whether the multi-feature fusion mode is effective for various feature extraction networks. For comparison, ResNet-101, LightCNN and Xcenter are selected as the feature extraction network, and the test is performed by respectively adopting single RGB feature and multi-feature fusion. The final calculation result is that the accuracy of ResNet-101 under the single RGB characteristic is 85.71%, the accuracy of LightCNN is 86.43%, and the accuracy of Xception is 87.86%; the accuracy of ResNet-101 under multi-feature fusion is 88.21% (+2.50), the accuracy of LightCNN is 87.86% (+1.43), and the accuracy of Xception is 90.36% (+ 2.50).
The output of the learnable SRM stream may also be visualized. The visual result proves that the noise map generated by the learnable SRM filter can accurately reflect the forged part of the input video frame; it has also been demonstrated that the constrained learnable SRM stream proposed by the present invention can generate better noise maps than the non-learnable and unconstrained SRM streams. The visualization results are shown in fig. 2.
In other embodiments of the present invention, the Xception network selected in step (3) and step (5) may be replaced by other identification networks.
Based on the same inventive concept, another embodiment of the present invention provides a deep forgery identification apparatus based on multi-feature fusion using the above method, including:
the preprocessing module is used for averagely dividing the input video into a plurality of video segments, randomly sampling a plurality of video frames for each video segment, and performing face detection and face alignment on each selected video frame to serve as an input video frame;
the double input stream processing module is used for respectively processing an input video frame by adopting an RGB input stream and a learnable SRM input stream, wherein the RGB input stream extracts semantic features of suspicious forged parts in the video frame and obtains a prediction result of deep forgery recognition according to the semantic features, and the learnable SRM input stream fits noise features of the suspicious forged parts in the video frame and obtains a prediction result of the deep forgery recognition according to the noise features;
and the fusion module is used for fusing the prediction result of the RGB input stream and the prediction result of the learnable SRM input stream to obtain a final deep forgery recognition result.
The specific implementation process of each module is referred to the description of the method of the invention.
Based on the same inventive concept, another embodiment of the present invention provides an electronic device (computer, server, smartphone, etc.) comprising a memory storing a computer program configured to be executed by the processor, and a processor, the computer program comprising instructions for performing the steps of the inventive method.
Based on the same inventive concept, another embodiment of the present invention provides a computer-readable storage medium (e.g., ROM/RAM, magnetic disk, optical disk) storing a computer program, which when executed by a computer, performs the steps of the inventive method.
The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all equivalent structures or equivalent flow transformations that may be embodied in the present specification and drawings, or directly or indirectly applied to other related arts, are included in the claimed scope of the present invention.

Claims (10)

1. A deep forgery identification method based on multi-feature fusion is characterized by comprising the following steps:
averagely dividing an input video into a plurality of video segments, randomly sampling a plurality of video frames for each video segment, and performing face detection and face alignment on each selected video frame to serve as an input video frame;
respectively processing an input video frame by adopting an RGB input stream and a learnable SRM input stream, wherein the RGB input stream extracts semantic features of suspicious forged parts in the video frame and obtains a prediction result of deep forgery recognition according to the semantic features, and the learnable SRM input stream fits noise features of the suspicious forged parts in the video frame and obtains a prediction result of the deep forgery recognition according to the noise features;
and fusing the prediction result of the RGB input stream and the prediction result of the learnable SRM input stream to obtain a final deep forgery recognition result.
2. The method according to claim 1, wherein a plurality of video frames are randomly sampled for each video segment, and png format is used as much as possible when extracting the video frames, so as to reduce the influence of picture compression on the falsification trace.
3. The method for deep forgery identification based on multi-feature fusion of claim 1, wherein the RGB input stream extracts semantic features of suspected forgery sites in video frames and obtains a prediction result of deep forgery identification based on the semantic features, including:
for the RGB input stream, respectively carrying out feature extraction on each video frame after face alignment by using an Xtrapping network, extracting the semantic features of suspicious forged parts in each video frame by using the Xtrapping network, finally averaging all the extracted features and activating by using a Softmax function to obtain the output of the RGB input stream, wherein the Xtrapping network in the whole process shares parameters.
4. The method for deep forgery identification based on multi-feature fusion of claim 1, wherein the learnable SRM input stream is fitted with noise features of suspected forgery sites in video frames and obtains a prediction result of deep forgery identification according to the noise features, comprising:
for the learnable SRM input stream, firstly removing the non-conductive parts, namely round function and truncate function, in the classic SRM algorithm, then replacing the hyperparameter q with 30 learnable matrixes of 5 x 5 to correspond to 30 SRM filters in the classic SRM algorithm, and respectively initializing the learnable matrixes into the maximum absolute values of all the elements which are equal to each other and are equal to the elements in the corresponding SRM filters;
dividing 30 SRM filters with corresponding learnable matrixes to obtain learnable matrixes with the dimensionality of 30 x 5, expanding the matrixes into SRM convolution kernels with the dimensionality of 30 x 3 x 5, inserting the SRM convolution kernels into an original Xception network architecture as a first layer of a neural network, and finely adjusting the first layer of the original Xception network to form the learnable SRM network; for the learnable SRM input stream, respectively extracting the features of each video frame after the face alignment by using a learnable SRM network, fitting and analyzing the noise features of the suspicious tampered parts in the K T frames by using the learnable SRM network, and finally averaging all the extracted features and activating the extracted features by using a Softmax function to obtain the output of the learnable SRM input stream; the SRM network sharing parameters can be learned in the whole process.
5. The deep forgery identification method based on multi-feature fusion of claim 3 or 4, wherein in the RGB input stream and the learnable SRM input stream, the network in the stream shares parameters, and the network between the streams is trained independently.
6. The deep forgery identification method based on multi-feature fusion of claim 4, wherein the setting of the learnable matrix ensures that the network training process will not damage the essential features of the SRM filter; reserving 0-value elements in 30 preset SRM filters by adopting a learnable matrix so as to reserve the characteristics of noise information calculated by the SRM filters; by initializing the learnable matrix, it is guaranteed that all parameters of the SRM convolution kernel are initialized to values belonging to [ -1,1 ].
7. The method of multi-feature fusion based deep forgery identification of claim 4, wherein the learnable matrix of 30 x 5 is extended to the convolution kernel of 30 x 3 x 5 in such a way that the matrix of 5 x 5 is duplicated to exactly 3 copies in the second dimension.
8. A deep forgery identification device based on multi-feature fusion and using the method of any one of claims 1 to 7, comprising:
the preprocessing module is used for averagely dividing the input video into a plurality of video segments, randomly sampling a plurality of video frames for each video segment, and performing face detection and face alignment on each selected video frame to serve as an input video frame;
the double input stream processing module is used for respectively processing an input video frame by adopting an RGB input stream and a learnable SRM input stream, wherein the RGB input stream extracts semantic features of suspicious forged parts in the video frame and obtains a prediction result of deep forgery recognition according to the semantic features, and the learnable SRM input stream fits noise features of the suspicious forged parts in the video frame and obtains a prediction result of the deep forgery recognition according to the noise features;
and the fusion module is used for fusing the prediction result of the RGB input stream and the prediction result of the learnable SRM input stream to obtain a final deep forgery recognition result.
9. An electronic apparatus, comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for performing the method of any of claims 1 to 7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a computer, implements the method of any one of claims 1 to 7.
CN202110473432.4A 2021-04-29 2021-04-29 Deep forgery identification method and device based on multi-feature fusion Pending CN114067381A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110473432.4A CN114067381A (en) 2021-04-29 2021-04-29 Deep forgery identification method and device based on multi-feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110473432.4A CN114067381A (en) 2021-04-29 2021-04-29 Deep forgery identification method and device based on multi-feature fusion

Publications (1)

Publication Number Publication Date
CN114067381A true CN114067381A (en) 2022-02-18

Family

ID=80233204

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110473432.4A Pending CN114067381A (en) 2021-04-29 2021-04-29 Deep forgery identification method and device based on multi-feature fusion

Country Status (1)

Country Link
CN (1) CN114067381A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115205763A (en) * 2022-09-09 2022-10-18 阿里巴巴(中国)有限公司 Video processing method and device
CN117014561A (en) * 2023-09-26 2023-11-07 荣耀终端有限公司 Information fusion method, training method of variable learning and electronic equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115205763A (en) * 2022-09-09 2022-10-18 阿里巴巴(中国)有限公司 Video processing method and device
CN117014561A (en) * 2023-09-26 2023-11-07 荣耀终端有限公司 Information fusion method, training method of variable learning and electronic equipment
CN117014561B (en) * 2023-09-26 2023-12-15 荣耀终端有限公司 Information fusion method, training method of variable learning and electronic equipment

Similar Documents

Publication Publication Date Title
Li et al. Identification of deep network generated images using disparities in color components
CN111311563B (en) Image tampering detection method based on multi-domain feature fusion
Yuan et al. Fingerprint liveness detection using an improved CNN with image scale equalization
Kong et al. Detect and locate: Exposing face manipulation by semantic-and noise-level telltales
CN112907598B (en) Method for detecting falsification of document and certificate images based on attention CNN
CN114067381A (en) Deep forgery identification method and device based on multi-feature fusion
Zhu et al. Deepfake detection with clustering-based embedding regularization
Mazumdar et al. Universal image manipulation detection using deep siamese convolutional neural network
Fu et al. Robust GAN-face detection based on dual-channel CNN network
CN112434599A (en) Pedestrian re-identification method based on random shielding recovery of noise channel
Velliangira et al. A novel forgery detection in image frames of the videos using enhanced convolutional neural network in face images
Yin et al. Dynamic difference learning with spatio-temporal correlation for deepfake video detection
Le-Tien et al. Image forgery detection: A low computational-cost and effective data-driven model
Oraibi et al. Enhancement digital forensic approach for inter-frame video forgery detection using a deep learning technique
CN114842524A (en) Face false distinguishing method based on irregular significant pixel cluster
Jin et al. AMFNet: an adversarial network for median filtering detection
Singh et al. Performance analysis of ELA-CNN model for image forgery detection
CN111178204B (en) Video data editing and identifying method and device, intelligent terminal and storage medium
CN111127407B (en) Fourier transform-based style migration forged image detection device and method
Hammad et al. An secure and effective copy move detection based on pretrained model
Rahmati et al. Double JPEG compression detection and localization based on convolutional auto-encoder for image content removal
Liu et al. Adaptive Texture and Spectrum Clue Mining for Generalizable Face Forgery Detection
Chen et al. Identification of image global processing operator chain based on feature decoupling
CN113392786A (en) Cross-domain pedestrian re-identification method based on normalization and feature enhancement
Lebedev et al. Face detection algorithm based on a cascade of ensembles of decision trees

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination