CN117079354A - Deep forgery detection classification and positioning method based on noise inconsistency - Google Patents

Deep forgery detection classification and positioning method based on noise inconsistency Download PDF

Info

Publication number
CN117079354A
CN117079354A CN202310837170.4A CN202310837170A CN117079354A CN 117079354 A CN117079354 A CN 117079354A CN 202310837170 A CN202310837170 A CN 202310837170A CN 117079354 A CN117079354 A CN 117079354A
Authority
CN
China
Prior art keywords
noise
fake
face
picture
mask
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310837170.4A
Other languages
Chinese (zh)
Inventor
凌贺飞
刘博元
李平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN202310837170.4A priority Critical patent/CN117079354A/en
Publication of CN117079354A publication Critical patent/CN117079354A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/40Spoof detection, e.g. liveness detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0475Generative networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/094Adversarial learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a depth forgery detection classification and positioning method based on noise inconsistency, belonging to the field of computer vision, comprising the following steps: training a fake texture enhancement and positioning model by adopting a training set; the training samples in the training set include: the label is mask label of real face, fake face and AIM corresponding to the face area; the AIM picture is used for manufacturing texture differences to be used as a self-made fake sample to participate in training of the model; and in the application stage, a target image to be detected is obtained, a trained fake texture enhancement and positioning model is input, and a target true and false detection result and a fake region positioning result corresponding to the target image are obtained. Meanwhile, the invention can manufacture the fake picture with the GAN texture, can provide more training data in the training stage, and can effectively improve the detection accuracy of the model to the data to be detected no matter aiming at the known fake type or the unknown fake method.

Description

Deep forgery detection classification and positioning method based on noise inconsistency
Technical Field
The invention belongs to the field of computer vision, and in particular relates to a depth forgery detection classification and positioning method based on noise inconsistency.
Background
With the revolutionary technological advances in multimedia technology and artificial intelligence, the content generated by artificial intelligence has grown in a blowout. The depth generation technology is utilized to forge the image, audio, video and other multimedia content layers of people and things in reality, so that digital media content with extremely high reality and difficult distinction can be generated. Aiming at the face counterfeiting method and the face synthesis thought which are different from each other, the academic world tries and explores from the aspects of active defense and passive defense in order to solve the transmission of false information. Deep forgery detection also faces many challenges as one of the effective ways of passive defense strategies.
Such as: in the process of training a data set, a deep fake face possibly causes a deep learning model to be subjected to overfitting to an existing fake method due to inherent fake textures, so that the finally trained model can reduce the picture quality of the data set when being migrated to an unknown data set, or can obviously reduce the performance when facing the unknown fake method or being applied to a real application scene. The existing deep counterfeiting detection method can only classify detection results, and cannot simultaneously realize positioning of counterfeiting areas; in the existing method, a plurality of GAN networks for performing attribute editing on a human face are adopted, such as networks for enabling head colors to change, alopecia to change and the like, and an obtained inconsistent area is used as a fake label, but the method directly enables GAN fingerprints to exist in a whole picture, and the whole picture is used as a training sample and is mainly used for solving the attribute editing of the human face, so that the true or false identification of the human face in a face changing task cannot be realized.
Disclosure of Invention
Aiming at the defects and improvement demands of the prior art, the invention provides a depth forgery detection classification and positioning method based on noise inconsistency, which aims to improve the prediction performance of a model.
To achieve the above object, according to a first aspect of the present invention, there is provided a depth forgery detection classification and positioning method based on noise inconsistency, comprising:
training phase: training a fake texture enhancement and positioning model by adopting a training set; the training samples in the training set include: the label is mask label of real face, fake face and AIM corresponding to the face area;
the AIM picture obtaining method comprises the following steps:
taking a real face serving as a background and a face serving as a foreground as inputs, and multiplying the foreground face by a mask to obtain a foreground output; multiplying the background face with the inverse mask of the mask to obtain background output; the foreground face comprises a real face, a fake face or a reconstructed picture output by the fake texture enhancement and positioning model;
adding the background output and the foreground output to obtain the AIM picture and a corresponding AIM mask label; the mask label is divided into a self-source picture, a source-changing picture and a reconstructed source picture according to different sources of the foreground face;
the application stage comprises the following steps: and obtaining a target image to be detected, inputting a trained fake texture enhancement and positioning model, and obtaining a target true and false detection result and a fake region positioning result corresponding to the target image.
Further, the fake texture enhancement and localization model includes:
the color encoder is used for extracting the color characteristics of each layer in the training sample to obtain the color characteristics of the bottommost layer;
the airspace rich model filter is used for filtering the noise of the training sample in the horizontal and vertical directions to obtain a first noise image; noise filtering is carried out on the color characteristics of each layer in the horizontal and vertical directions according to the channel dimension, and a second noise image containing the color characteristics of each layer is obtained;
the noise coder is used for extracting noise characteristics of different layers of the first noise image and adding the noise characteristics with the characteristics of the second noise image of the corresponding layer according to the bit to obtain the noise characteristics of the bottommost layer;
the classifier is configured to perform linear classification on the bottommost color feature and the bottommost noise feature to obtain a predicted detection result classification, where the classification includes: real face, fake face, self-source picture, source-changing picture and reconstructed source picture;
the color decoder is used for decoding the color characteristics of the bottommost layer to obtain the reconstructed picture;
and the noise decoder is used for decoding the noise characteristics of the bottommost layer so as to predict the fake area.
Further, in the training process, the loss constraint of the color decoder includes: pixel loss, perceptual loss, and generation of countermeasures against network loss between the reconstructed picture and the corresponding label.
Further, in the training process, the loss of the classifier is the cross entropy loss between the predicted detection result and the corresponding label;
the loss of the noise decoder is a characteristic loss between the predicted falsified region and the corresponding tag.
Further, the color encoder has the same structure as the noise encoder;
the color decoder has the same structure as the noise decoder;
the intermediate feature dimension of the encoder and the decoder is consistent from shallow layer features to bottom layer features in reverse order, the encoder is the color encoder or the noise encoder, and the decoder is the color decoder or the noise decoder.
Further, before multiplying the front Jing Ren face with the mask, the method further includes:
and carrying out random data enhancement on the foreground face according to the picture resolution difference, the noise mode difference, the color difference, the five sense organs mismatch difference and the lamination trace.
Further, the mask includes a full face region, a five sense organ region, and any combination thereof;
or/and, the mask is a ring mask.
Further, the method also comprises the step of randomly transforming the shape and the size of the mask;
the random transformation includes: random piecewise affine transformation and random kernel-size erosion and dilation.
According to a second aspect of the present invention, there is provided a noise inconsistency-based depth forgery detection classification and localization system comprising a computer readable storage medium and a processor;
the computer-readable storage medium is for storing executable instructions;
the processor is configured to read executable instructions stored in the computer readable storage medium to perform the method of any one of the first aspects.
According to a third aspect of the present invention there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor implements a method according to any of the first aspects.
In general, through the above technical solutions conceived by the present invention, the following beneficial effects can be obtained:
(1) According to the method, the reconstructed image output by the real face, the forged face or the forged texture enhancement and positioning model is used as a foreground face and added with a background face to obtain the AIM image with the manufactured texture difference, the generated AIM image is used as one of the forged samples in the data set to train the forged texture enhancement and positioning model, the category of the forged samples is increased, the self-made forged samples by the method can capture the commonality of the forged image, the model can be prevented from being involved in the overfitting of the known texture by adding the forged samples into the training, more training data can be provided in the training stage, the recognition capability of the model to an unknown forging method is further improved, the deep forged face detection classification and the forged region positioning are carried out based on the forged samples, and the prediction performance of the model can be improved.
(2) Further, the fake texture enhancement and positioning model is of a double-coding and decoding network structure, training samples are input to a color encoder to extract color features of images, then an original picture is reconstructed through a color decoder, the reconstructed original picture is used as a foreground face again to generate an AIM picture so as to construct fake samples to participate in training of the model, and the number of the training samples is increased continuously; the noise information filtered by the airspace rich model filter is input into a noise encoder, the middle multi-level characteristic of the color encoder is also filtered by the airspace rich model, the tensor dimension of the color encoder is kept unchanged, the noise of the color characteristic is extracted, and the noise is fused with the corresponding layer characteristic extracted by the noise encoder, so that the depth counterfeiting detection classification and the counterfeiting area positioning are finally realized. The model structure designed by the invention can realize multi-classification of detection results and positioning prediction of fake areas in a training stage, and the generated reconstructed image is used for constructing fake images carrying special textures again so as to expand training samples and further improve the recognition accuracy of the model.
(3) Further, pixel loss, perception loss and generation of countering network loss between the reconstructed picture and the corresponding label are taken as constraints, so that the reconstructed image output by the color decoder contains GAN fingerprint sums, the reconstructed image containing the GAN fingerprint is taken as the input of the intra-mask enhancement module again, the generated fake sample (AIM picture) can effectively simulate the texture information and inconsistent noise mode of GAN, the fake sample is input into the model again to train the model, the performance of the model can be further improved, and the true and false recognition of the face with the face changing task is realized.
(4) Further, the color encoder and the noise encoder have the same structure, noise encoding outputs noise characteristics with the same dimension as the color characteristics output by the color encoder, and the two noise characteristics are connected and input into the linear classifier to output multi-classification detection results of pictures, wherein the multi-classification detection results comprise real pictures, fake pictures, self-source pictures, source-changing pictures and reconstructed source pictures.
(5) Preferably, the foreground face is subjected to random data enhancement, so that the generated AIM image contains the characteristics of noise mode difference, color difference, picture resolution difference, mismatching of five sense organs, fitting trace and the like, common artifact effect and false discrimination clues are fitted, and the texture difference of self-made fake samples is improved.
(6) Preferably, the mask region comprising the five sense organs region and the combination thereof is designed to be used as supervision information to assist the location of the counterfeit region; and the mask is subjected to shape and size random transformation to increase the diversity of the mask, so that the performance of the model can be further improved when the generated AIM picture is taken as a fake sample. Meanwhile, the mask is designed into an annular mask, and false fitting can be performed on artifact marks which appear in the edge area of some false face pictures.
The fake texture enhancement and positioning model of the invention realizes fake classification at the same time, outputs fake areas, simultaneously manufactures a double-coding-decoding network structure of fake pictures with GAN textures, can provide more training data in a training stage, and can effectively improve the detection accuracy performance of the model to the data to be detected no matter aiming at the known fake types or unknown fake methods.
Drawings
FIG. 1 is a flow diagram of intra-mask enhancement in an embodiment of the invention.
FIG. 2 is an overall block diagram of the fake texture enhancement and localization model training phase of the present invention.
Fig. 3 is a diagram of a training sample example corresponding to the enhancement generation in the mask when the foreground face is a fake face in the embodiment of the present invention, where (a) - (e) in the diagram correspond to the background picture, the foreground picture, the enhancement picture in the mask, and the picture in the highlight mask, respectively.
Fig. 4 is a diagram of a training sample example corresponding to the enhancement generation in the mask when the foreground face is the real face in the embodiment of the present invention, where (a) - (e) in the diagram correspond to the background picture, the foreground picture, the enhancement picture in the mask, and the picture in the highlight mask, respectively.
Fig. 5 is a diagram of corresponding training samples generated by the enhancement in the mask when the foreground face is the reconstructed picture, where (a) - (e) in the diagram correspond to the background picture, the foreground picture, the enhanced picture in the mask, and the picture in the highlight mask, respectively, in the embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
In the present invention, the terms "first," "second," and the like in the description and in the drawings are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.
As shown in fig. 1, the depth forgery detection classification and positioning method based on noise inconsistency of the present invention includes:
training phase:
training a fake texture enhancement and positioning model by adopting a training set; the training samples in the training set include: the label is a mask label of the real face, the fake face and the AIM, and the corresponding face area is the real face, the fake face and the AIM;
the AIM picture is obtained through an intra-mask enhancement (AIM) module; the inputs to the in-mask enhancement (AIM) module are: a real face serving as a background and a face serving as a foreground, wherein the foreground face comprises the real face, a fake face or a reconstructed picture output by a fake texture enhancement and positioning model;
after the foreground face is enhanced by random data, multiplying the foreground face by a designed mask M to obtain foreground output;
multiplying the background face by an inverse mask (1-M) of the mask M to obtain background output;
adding the background output and the foreground output to obtain an AIM picture (namely an enhanced picture in a mask) and a corresponding mask label; the mask labels are classified into different categories according to different sources of the foreground face, and the categories comprise self-source pictures, source-changing pictures and reconstructed source pictures; when the foreground face is a real face, the corresponding mask label is a self-source picture, namely the mask of the self-source picture is enhanced, and in the embodiment of the invention, the mask label is marked as a category 1; when the foreground face is a forged face, the corresponding mask label is a source-changing picture, namely the mask of the replaced face is enhanced, and in the embodiment of the invention, the mask label is marked as a category 2; when the foreground face is a reconstructed picture output by the fake texture enhancement and positioning model, the corresponding mask label is used for reconstructing the source picture, namely the intra-mask enhancement of the GAN picture, and is recorded as a category 3. In the embodiment of the invention, the original real face and the fake face in the data set are added, and five categories are adopted. In the training stage, the category 1-category 3 is used as additional supervision information, and can assist the model to distinguish the depth forgery with unchanged identity information, the depth forgery with identity replacement and the depth forgery with GAN texture information; in the application reasoning stage, the fake face in the class 1-class 3 and the fake face in the data set sample are regarded as fake class uniformly.
The application stage comprises the following steps: and obtaining a target image to be detected, inputting the target image into a pre-trained fake texture enhancement and positioning model, and obtaining a target true and false detection result and a fake region positioning result corresponding to the target image.
Specifically, in the intra-mask enhancement module, the designed mask M includes a full-face region, a five-sense organ region, and any combination region thereof. In order to expand the diversity of the mask, the shape and the size of the mask corresponding to the sample are subjected to random transformation to enhance the shape diversity of the mask, including random piecewise affine transformation and erosion and expansion of the size of a random kernel, so that the shape and the size of the mask image are effectively changed, and the mask diversity is increased. Further, the mask is designed to be a ring mask, so that fake fitting can be performed on artifact marks which appear in the edge area of some fake face pictures.
Specifically, in the selection of random data enhancement of the foreground face, different common artifact effects of face counterfeiting and false discrimination clues are considered, such as: the combination types of abundant data enhancement are selected for the foreground picture to fit the artifact effects and the artifact discrimination clues. Meanwhile, relevant parameters of the foreground face are randomly floated in a fixed range, so that the diversity of the enhanced form is ensured. Fake samples containing complex noise pattern differences can be designed by random data enhancement of foreground faces.
Specifically, as shown in fig. 2, the counterfeit texture enhancement and localization model of the present invention includes: a color encoder, a color decoder, a classifier, a spatial rich model filter, a noise encoder, and a noise decoder;
the color encoder is used for extracting color features of different layers in training samples in the training set to obtain the color features of the bottommost layer;
the airspace rich model filter is used for filtering noise in the horizontal direction and noise in the vertical direction of a training sample (including a real face, a false face and a color image of an AIM picture) to obtain a first noise image; noise filtering is carried out on the middle features of each layer extracted in the color encoder in the horizontal and vertical directions according to the channel dimension, and a second noise image containing color features of each layer is obtained;
the noise coder is used for extracting noise characteristics of different layers of the first noise image obtained by the training sample through the airspace rich model filter, and carrying out bit-wise addition on the noise characteristics and the characteristics of the second noise image containing color characteristics of the corresponding layer to obtain the noise characteristics of the bottommost layer;
the classifier is used for carrying out global average pooling on the color features of the bottommost layer and the noise features of the bottommost layer respectively, flattening the color features into one-dimensional feature vectors, carrying out feature connection, and then carrying out linear classification to obtain detection result classification; specifically, in the training process, the cross entropy loss between the result of the prediction output of the minimum classifier and the corresponding label is taken as a target, the model parameters are regulated, and when the loss converges, a multi-classification detection result is obtained: real pictures, fake pictures, self-source pictures, source-changing pictures and reconstructed source pictures are five categories. Wherein the fake picture is a fake picture from the training dataset; the self-source picture, the source-changing picture and the reconstructed source picture are all from the in-mask enhancement module and respectively correspond to the real face of the input foreground face of the in-mask enhancement module, the fake face from the data set and the reconstructed picture output by the color decoder. In the application reasoning stage, the classifier is regarded as a classification task, and the probability of the forged image output by the classifier is the sum of the prediction probabilities of the forged image, the self-source image, the source-changing image and the reconstructed source image of the data set.
The color decoder is used for decoding the color characteristics of the bottommost layer to obtain fingerprint information containing GAN and reconstructed pictures of inconsistent noise modes; specifically, in the training process, the supervisory encoder is more focused and learns complete visual characteristics by a reconstruction learning method, so as to provide sufficient color information for the decoder and guide the noise decoder to pay attention to multi-level noise information. In the reconstruction process, pixel loss, perception loss and generation between the reconstructed image and the corresponding label are used for restraining network loss, so that the generated image carries fingerprint information of GAN and unique noise patterns while the generated quality of the generated image is guaranteed, and the fingerprint information and inconsistent noise patterns can be regarded as fake marks. The generated reconstructed image is used as a foreground image of the AIM, is processed by the intra-mask enhancement module, is added into a data set, and is used for iterative training of fake texture enhancement and positioning models again.
The noise decoder is used for decoding the noise characteristics of the bottom layer so as to predict the fake area. Unlike color decoders, which require prediction of a fake region, the noise decoder is regarded as a semantic segmentation task, and in the embodiment of the present invention, the prediction of the fake region is implemented by using multiple layers of features from a first noise image and combining features of a second noise image including color features corresponding to intermediate features from each layer extracted in a color encoder. In the training process, the method aims at minimizing the feature loss between the predicted output of the noise decoder and the corresponding fake region label, and the predicted fake region is obtained when model parameters are adjusted and the loss converges. Wherein, for the input real image, no fake area exists, and the whole value of the mask label image is 0; for fake images in the input data set, using the face area as a fake area label; for a counterfeit image (AIM image) produced by the in-mask enhancement module, the in-mask enhancement module will generate a corresponding matching counterfeit area label.
Specifically, in the application stage, a target image to be detected is obtained, the target image is input into a pre-trained fake texture enhancement and positioning model, and a classifier outputs a target authenticity detection result corresponding to the target image; the noise decoder generates a localization result of the forgery area corresponding to the target image.
Specifically, in the embodiment of the present invention, the network structures of the color encoder and the noise encoder are the same (parameters are different), and an afflicientnet-b 4 network structure is adopted, and in other embodiments, other encoder structures may also be adopted. The network structure of the color decoder and the noise decoder is the same, wherein the structure of the color decoder and the structure of the color encoder are the same, the structure of the noise decoder and the structure of the noise encoder are the same, namely the middle characteristic dimension of the decoder is consistent with the characteristic dimension of the encoder from the shallow layer characteristic to the dimension reverse sequence of the bottom layer characteristic, and the noise encoder can also transmit the extracted noise characteristic to the noise decoder to predict the fake region. In the training stage, the color encoder directly receives the picture, extracts the color characteristics, inputs the color decoder to restore the original image, guides the model to learn the global color visual characteristics in a reconstruction learning mode, does not concentrate on the special textures of the training data set, and the reconstructed image can carry various types of forged special textures through various loss function constraints and can be regarded as a forged picture for expanding the training samples in the mask enhancement module.
In the embodiment of the present invention, as shown in fig. 3 to 5, when the foreground face is a fake face, a real face, and a reconstructed picture, the obtained source-changing picture, self-source picture, and reconstructed source picture are respectively.
According to a second aspect of the present invention, there is provided a noise inconsistency-based depth forgery detection classification and localization system comprising a computer readable storage medium and a processor;
the computer-readable storage medium is for storing executable instructions;
the processor is configured to read executable instructions stored in the computer readable storage medium to perform the method of any one of the first aspects.
According to a third aspect of the present invention there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor implements a method as in any of the first aspects.
The method is characterized in that a sample picture with texture difference is manufactured through a designed intra-mask enhancement module, a real face, a fake face or a reconstruction image output by a fake texture enhancement and positioning model is used as a foreground face and added with a background face to obtain an AIM picture, the generated AIM picture is used as one of fake samples in a data set to train the fake texture enhancement and positioning model, the category of the fake sample is increased, the self-made fake sample by the method can capture the commonality of fake images, the model can be prevented from being involved in over fitting of known textures by adding the model into training, more training data can be provided in the training stage, the recognition capability of the model to an unknown fake method is further improved, and the depth fake face detection classification and fake region positioning are carried out based on the fake sample, so that the prediction performance of the model can be improved.
The fake texture enhancement and positioning model is of a double-coding and decoding network structure, training samples are input to a color encoder to extract color features of images, then an original picture is reconstructed through the color decoder, and the reconstructed original picture is used as a foreground face again to generate an AIM picture so as to construct fake samples to participate in training of the model; the noise information filtered by the airspace rich model filter is input into a noise encoder, the middle multi-level characteristic of the color encoder is also filtered by the airspace rich model, the tensor dimension of the color encoder is kept unchanged, the noise of the color characteristic is extracted, and the noise is fused with the corresponding layer characteristic extracted by the noise encoder, so that the depth counterfeiting detection classification and the counterfeiting area positioning are finally realized. The model structure designed by the invention can realize multi-classification of detection results and positioning prediction of fake areas in a training stage, and the generated reconstructed image is used for constructing fake images carrying special textures again so as to expand training samples.
The pixel loss, the perception loss and the generation of the countering network loss between the reconstructed picture and the corresponding label are taken as constraints, so that the reconstructed image output by the color decoder contains GAN fingerprints and the reconstructed image containing the GAN fingerprints is taken as the input of the intra-mask enhancement module, the generated fake sample (AIM picture) can effectively simulate the texture information and inconsistent noise modes of GAN, the fake sample is input into the model again for training, the performance of the model can be improved, and the true and false identification of the human face with the face changing task is realized.
It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (10)

1. A method for classifying and locating deep forgery detection based on noise inconsistency, comprising:
training phase: training a fake texture enhancement and positioning model by adopting a training set; the training samples in the training set include: the label is mask label of real face, fake face and AIM corresponding to the face area;
the AIM picture obtaining method comprises the following steps:
taking a real face serving as a background and a face serving as a foreground as inputs, and multiplying the foreground face by a mask to obtain a foreground output; multiplying the background face with the inverse mask of the mask to obtain background output; the foreground face comprises a real face, a fake face or a reconstructed picture output by the fake texture enhancement and positioning model;
adding the background output and the foreground output to obtain the AIM picture and a corresponding AIM mask label; the mask label is divided into a self-source picture, a source-changing picture and a reconstructed source picture according to different sources of the foreground face;
the application stage comprises the following steps: and obtaining a target image to be detected, inputting a trained fake texture enhancement and positioning model, and obtaining a target true and false detection result and a fake region positioning result corresponding to the target image.
2. The method of claim 1, wherein the counterfeit texture enhancement and localization model comprises:
the color encoder is used for extracting the color characteristics of each layer in the training sample to obtain the color characteristics of the bottommost layer;
the airspace rich model filter is used for filtering the noise of the training sample in the horizontal and vertical directions to obtain a first noise image; noise filtering is carried out on the color characteristics of each layer in the horizontal and vertical directions according to the channel dimension, and a second noise image containing the color characteristics of each layer is obtained;
the noise coder is used for extracting noise characteristics of different layers of the first noise image and adding the noise characteristics with the characteristics of the second noise image of the corresponding layer according to the bit to obtain the noise characteristics of the bottommost layer;
the classifier is configured to perform linear classification on the bottommost color feature and the bottommost noise feature to obtain a predicted detection result classification, where the classification includes: real face, fake face, self-source picture, source-changing picture and reconstructed source picture;
the color decoder is used for decoding the color characteristics of the bottommost layer to obtain the reconstructed picture;
and the noise decoder is used for decoding the noise characteristics of the bottommost layer so as to predict the fake area.
3. The method of claim 2, wherein during training, the loss constraint of the color decoder comprises: pixel loss, perceptual loss, and generation of countermeasures against network loss between the reconstructed picture and the corresponding label.
4. A method according to claim 2 or 3, wherein the loss of the classifier during training is a cross entropy loss between the predicted detection result and the corresponding label;
the loss of the noise decoder is a characteristic loss between the predicted falsified region and the corresponding tag.
5. The method of claim 4, wherein the color encoder is the same structure as the noise encoder;
the color decoder has the same structure as the noise decoder;
the intermediate feature dimension of the encoder and the decoder is consistent from shallow layer features to bottom layer features in reverse order, the encoder is the color encoder or the noise encoder, and the decoder is the color decoder or the noise decoder.
6. The method of claim 1, wherein prior to multiplying the front Jing Ren face with a mask, further comprising:
and carrying out random data enhancement on the foreground face according to the picture resolution difference, the noise mode difference, the color difference, the five sense organs mismatch difference and the lamination trace.
7. The method of claim 1, wherein the mask comprises a full face region, a five sense organ region, and any combination thereof;
or/and, the mask is a ring mask.
8. The method of claim 3, further comprising randomly transforming the shape and size of the mask;
the random transformation includes: random piecewise affine transformation and random kernel-size erosion and dilation.
9. A noise inconsistency-based deep forgery detection classification and localization system, comprising a computer readable storage medium and a processor;
the computer-readable storage medium is for storing executable instructions;
the processor is configured to read executable instructions stored in the computer readable storage medium to perform the method of any one of claims 1-8.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-8.
CN202310837170.4A 2023-07-10 2023-07-10 Deep forgery detection classification and positioning method based on noise inconsistency Pending CN117079354A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310837170.4A CN117079354A (en) 2023-07-10 2023-07-10 Deep forgery detection classification and positioning method based on noise inconsistency

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310837170.4A CN117079354A (en) 2023-07-10 2023-07-10 Deep forgery detection classification and positioning method based on noise inconsistency

Publications (1)

Publication Number Publication Date
CN117079354A true CN117079354A (en) 2023-11-17

Family

ID=88703186

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310837170.4A Pending CN117079354A (en) 2023-07-10 2023-07-10 Deep forgery detection classification and positioning method based on noise inconsistency

Country Status (1)

Country Link
CN (1) CN117079354A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117593311A (en) * 2024-01-19 2024-02-23 浙江大学 Depth synthetic image detection enhancement method and device based on countermeasure generation network
CN118247493A (en) * 2024-05-23 2024-06-25 杭州海康威视数字技术股份有限公司 Fake picture detection and positioning method and device based on segmentation integrated learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117593311A (en) * 2024-01-19 2024-02-23 浙江大学 Depth synthetic image detection enhancement method and device based on countermeasure generation network
CN118247493A (en) * 2024-05-23 2024-06-25 杭州海康威视数字技术股份有限公司 Fake picture detection and positioning method and device based on segmentation integrated learning

Similar Documents

Publication Publication Date Title
CN112818862B (en) Face tampering detection method and system based on multi-source clues and mixed attention
CN111311563B (en) Image tampering detection method based on multi-domain feature fusion
Li et al. Beyond single reference for training: Underwater image enhancement via comparative learning
CN117079354A (en) Deep forgery detection classification and positioning method based on noise inconsistency
CN111160264B (en) Cartoon character identity recognition method based on generation countermeasure network
CN113762138B (en) Identification method, device, computer equipment and storage medium for fake face pictures
CN112069891B (en) Deep fake face identification method based on illumination characteristics
Yang et al. Spatiotemporal trident networks: detection and localization of object removal tampering in video passive forensics
Wang et al. Exposing deep-faked videos by anomalous co-motion pattern detection
Kim et al. Exposing fake faces through deep neural networks combining content and trace feature extractors
Yu et al. Detecting deepfake-forged contents with separable convolutional neural network and image segmentation
Baek et al. Generative adversarial ensemble learning for face forensics
Peng et al. BDC-GAN: Bidirectional conversion between computer-generated and natural facial images for anti-forensics
Miao et al. Learning forgery region-aware and ID-independent features for face manipulation detection
CN113112416A (en) Semantic-guided face image restoration method
Korshunov et al. Vulnerability of face recognition to deep morphing
CN113537027A (en) Face depth forgery detection method and system based on facial segmentation
CN115035052B (en) Fake face-changing image detection method and system based on identity difference quantification
Alnaim et al. DFFMD: a deepfake face mask dataset for infectious disease era with deepfake detection algorithms
Huang et al. DS-UNet: a dual streams UNet for refined image forgery localization
Wang et al. An audio-visual attention based multimodal network for fake talking face videos detection
Zhu et al. Rggid: A robust and green gan-fake image detector
CN111539263B (en) Video face recognition method based on aggregation countermeasure network
Dalal et al. False media detection by using deep learning
Yadav et al. Investigating the Impact of Visual Attention Models in Face Forgery Detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination