CN116311434A

CN116311434A - Face counterfeiting detection method and device, electronic equipment and storage medium

Info

Publication number: CN116311434A
Application number: CN202310173749.5A
Authority: CN
Inventors: 陈晨; 王源; 彭思龙
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2023-02-23
Filing date: 2023-02-23
Publication date: 2023-06-23

Abstract

The invention relates to the technical field of machine learning and computer vision, and provides a face counterfeiting detection method, a device, electronic equipment and a storage medium, wherein the method comprises the steps of firstly acquiring a face image to be detected, and extracting image characteristics of the face image to be detected, wherein the image characteristics comprise content characteristics and texture characteristics; then fusing the content features and the texture features to obtain a content domain fusion result and a texture domain fusion result, and fusing the content domain fusion result and the texture domain fusion result to obtain a feature cross-domain fusion result; and finally, determining a fake detection result of the face image to be detected based on the feature cross-domain fusion result. According to the method, the false evidence of the face in the face image to be detected is represented from multiple dimensions by extracting the content features and the texture features, so that the accuracy of the false detection result can be improved. In addition, the accuracy and the reliability of the fake detection result can be further improved through the intra-domain fusion and the cross-domain fusion of the features.

Description

Face counterfeiting detection method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of machine learning and computer vision technologies, and in particular, to a face counterfeit detection method, a face counterfeit detection device, an electronic device, and a storage medium.

Background

Face falsification detection, i.e., determining whether a face contained in a given picture is falsified.

At present, the face counterfeiting detection technology mainly comprises two modes, namely, counterfeiting detection is carried out by adopting artificially designed deep semantic features, such as consistency of head gestures, abnormal blink frequency and the like. The other is to use facial defect characteristics based on data driving to perform fake detection, such as inconsistent area textures, abnormal generation artifacts, abnormal spectrum domain distribution and the like.

However, the above method uses only deep semantic features or facial defect features, which results in low face-forgery detection accuracy.

Disclosure of Invention

The invention provides a face counterfeiting detection method, a face counterfeiting detection device, electronic equipment and a storage medium, which are used for solving the defects in the prior art.

The invention provides a face counterfeiting detection method, which comprises the following steps:

acquiring a face image to be detected, and extracting image features of the face image to be detected, wherein the image features comprise content features and texture features;

Respectively fusing the content features and the texture features to obtain a content domain fusion result and a texture domain fusion result, and fusing the content domain fusion result and the texture domain fusion result to obtain a feature cross-domain fusion result;

and determining a fake detection result of the face image to be detected based on the feature cross-domain fusion result.

According to the face counterfeiting detection method provided by the invention, the image features comprise a plurality of layers;

correspondingly, the fusing the content features and the texture features to obtain a content domain fusion result and a texture domain fusion result, and fusing the content domain fusion result and the texture domain fusion result to obtain a feature cross-domain fusion result, which comprises the following steps:

fusing the content features of each level to obtain a content domain fusion result of each level, and fusing the texture features of each level to obtain a texture domain fusion result of each level;

and fusing the content domain fusion result and the texture domain fusion result of each level to obtain the characteristic cross-domain fusion result of each level.

According to the face counterfeiting detection method provided by the invention, the extracting of the image features of the face image to be detected comprises the following steps:

For the lowest level, determining image features of the lowest level based on initial features of the lowest level of the face image to be detected;

for any level other than the lowest level, determining image features of the any level based on feature cross-domain fusion results of a level preceding the any level and initial features of the any level.

According to the face counterfeiting detection method provided by the invention, the determining of the image features of any level based on the feature cross-domain fusion result of the previous level of any level and the initial features of any level comprises the following steps:

overlapping the image features of the previous level and the feature cross-domain fusion result element by element to obtain a feature overlapping result of the previous level;

and combining the feature superposition result of the previous level with the initial feature of any level in the channel dimension to obtain the image feature of any level.

According to the face counterfeiting detection method provided by the invention, the content features of each level are fused to obtain the content domain fusion result of each level, and the texture features of each level are fused to obtain the texture domain fusion result of each level, which comprises the following steps:

And respectively fusing the content features and the texture features of each level based on the feature interaction vectors corresponding to each level to obtain a content domain fusion result and a texture domain fusion result of each level.

According to the face counterfeiting detection method provided by the invention, the fusion of the content domain fusion result and the texture domain fusion result of each level is carried out to obtain the characteristic cross-domain fusion result of each level, and the method comprises the following steps:

based on a bilinear pooling method, feature embedding is carried out on the content domain fusion result and the texture domain fusion result of each level, so as to obtain the content texture semantic relation of each level;

nonlinear normalization is carried out on the semantic relation of the content texture of each level, and similarity aggregation is carried out on the results obtained by normalization;

and determining a characteristic cross-domain fusion result of each level based on the results obtained by similarity aggregation.

According to the face counterfeiting detection method provided by the invention, the determining of the counterfeiting detection result of the face image to be detected based on the feature cross-domain fusion result comprises the following steps:

respectively overlapping the highest-level characteristic cross-domain fusion result with the highest-level content characteristic and the highest-level texture characteristic element by element to obtain a highest-level content overlapping result and a highest-level texture overlapping result;

And based on the content superposition result and the texture superposition result of the highest level, carrying out true and false classification on the face image to be detected to obtain the fake detection result.

According to the face counterfeiting detection method provided by the invention, the texture features are extracted based on the difference-by-difference convolution operator or based on the difference-by-difference convolution operator and the center difference convolution operator.

According to the face counterfeiting detection method provided by the invention, the difference-by-difference convolution operator is expressed based on the following formula:

wherein f _SDC (F ^l ，k ² ) Representing the difference-by-difference convolution operator, F ^l Representing texture features of level l, k ² Representing a difference-by-difference convolution kernel, z _n And z _m All represent k ² N and m each represent k ² Is a position index of the current receptive field of (c),

represents k ² Is a set of current receptive field position pairs, k _i ² Is k ² I=n.

The invention also provides a face counterfeiting detection device, which comprises:

the feature extraction module is used for acquiring a face image to be detected and extracting image features of the face image to be detected, wherein the image features comprise content features and texture features;

the feature fusion module is used for respectively fusing the content features and the texture features to obtain a content domain fusion result and a texture domain fusion result, and fusing the content domain fusion result and the texture domain fusion result to obtain a feature cross-domain fusion result;

And the fake detection module is used for determining the fake detection result of the face image to be detected based on the characteristic cross-domain fusion result.

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the face falsification detection method according to any one of the above when executing the program.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a face falsification detection method as described in any one of the above.

The invention also provides a computer program product comprising a computer program which when executed by a processor implements a face falsification detection method as described in any one of the above.

The invention provides a face counterfeiting detection method, a face counterfeiting detection device, electronic equipment and a storage medium, wherein the method comprises the steps of firstly acquiring a face image to be detected, and extracting image characteristics of the face image to be detected, wherein the image characteristics comprise content characteristics and texture characteristics; then fusing the content features and the texture features to obtain a content domain fusion result and a texture domain fusion result, and fusing the content domain fusion result and the texture domain fusion result to obtain a feature cross-domain fusion result; and finally, determining a fake detection result of the face image to be detected based on the feature cross-domain fusion result. According to the method, the false evidence of the face in the face image to be detected is represented from multiple dimensions by extracting the content features and the texture features, so that the accuracy of the false detection result can be improved. In addition, the method can further improve the accuracy and reliability of the falsification detection result through the intra-feature fusion and the inter-feature fusion, has good generalization for multiple data sets, multiple falsification types and multiple falsification modes, has good robustness for high noise, strong compression and other visual disturbance in a real scene, can be effectively deployed to terminal equipment, and can be used for falsification detection of media data or real-time face images on the Internet.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious to those skilled in the art that other drawings can be obtained according to these drawings without inventive effort.

FIG. 1 is a schematic flow chart of a face counterfeiting detection method provided by the invention;

FIG. 2 is a schematic diagram of the operation of a CDC operator in the face falsification detection method provided by the present invention;

FIG. 3 is a schematic diagram of the operation of the SDC operator in the face counterfeiting detection method provided by the invention;

fig. 4 is a schematic diagram of the principle of the first-level cross gfi in the face counterfeit detection method provided by the invention;

FIG. 5 is a schematic diagram of a face-forgery detection model in the face-forgery detection method provided by the present invention;

fig. 6 is a schematic structural diagram of a face-forgery detection device provided by the present invention;

fig. 7 is a schematic structural diagram of an electronic device provided by the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In recent years, with the development of deep learning, particularly neural networks, which are called generation countermeasure networks (Generative Adversarial Networks, GAN) mainly used for image generation and editing tasks, deep learning has been started to exceed mankind in many visual tasks, and very realistic images can be generated. Many fake pictures and fake video generation methods are derived from generation methods such as generation of an countermeasure network, and the most attractive and most harmful type is fake video with a face replaced.

In one aspect, face depth forging techniques, represented by deep, zao, etc., may be used to make interesting short video applications, such as replacing a user's face into a movie fragment using face replacement techniques, and implementing static portrait driving using expression replay techniques. On the other hand, due to the characteristics of easy generation, short generation time, high fidelity and the like, the face depth forging technology is also easy to use, and the face depth forging technology can be used for producing films, false news, rumors and the like by synthesizing false voices. Therefore, how to detect the face fake video in the internet becomes a research hotspot of the computer vision community and a problem to be solved urgently.

In order to minimize the influence of the depth forgery technology and eliminate the transmission of false videos, academia and industry begin to explore different depth forgery detection technologies, and a series of defense methods are proposed from different aspects, wherein the methods cover various modes such as a space domain, a time domain, a frequency domain and the like, and a series of successes are achieved on some specific data sets. Furthermore, a learner constructs a dataset and expands a multi-angle study of Deepfakes detection.

At present, the face counterfeiting detection technology mainly comprises two modes, namely, counterfeiting detection is carried out by adopting high-level semantic features which are designed manually, such as consistency of head gestures, abnormal blink frequency and the like. The other is to use facial defect characteristics based on data driving to perform fake detection, such as inconsistent area textures, abnormal generation artifacts, abnormal spectrum domain distribution and the like.

Based on the above, the embodiment of the invention provides a face counterfeiting detection method.

Fig. 1 is a schematic flow chart of a face counterfeiting detection method provided in an embodiment of the present invention, as shown in fig. 1, the method includes:

S1, acquiring a face image to be detected, and extracting image features of the face image to be detected, wherein the image features comprise content features and texture features;

s2, respectively fusing the content features and the texture features to obtain a content domain fusion result and a texture domain fusion result, and fusing the content domain fusion result and the texture domain fusion result to obtain a feature cross-domain fusion result;

s3, determining a fake detection result of the face image to be detected based on the feature cross-domain fusion result.

Specifically, in the face-forgery detection method provided in the embodiment of the present invention, the execution subject is a face-forgery detection device, and the device may be configured in a computer, where the computer may be a local computer or a cloud computer, and the local computer may be a computer, a tablet, or the like, and is not limited herein specifically.

Step S1 is executed first, and a face image to be detected is obtained. The face image to be detected refers to an image in which it is necessary to determine whether the face therein is a real face or a fake face. The face image to be detected may be a gray image or a color image, which is not particularly limited herein.

Inspired by an image decomposition theory, an energy minimization model of energy variation is utilized to decompose the face image I to be detected into two parts of a structured content space(s) and a fine grain texture space (t):

Where s can model isomorphic content, t contains oscillation modes such as noise, texture, etc.

Since the mere adoption of the content features is not sufficient to distinguish subtle differences between real faces and false faces. Therefore, in the embodiment of the invention, the image characteristics of the face image to be detected are extracted. The image features may include content features, which may be features that characterize content semantic information of a face in the image to be detected, and texture features, which may be features that characterize texture detail information of the face in the image to be detected.

The image features may comprise a plurality of levels that differ in one level or granularity, i.e. the content features as well as the texture features may each comprise a plurality of levels that differ in one level or granularity. When the image features include a plurality of levels, the content features are the same as the number of levels of the texture features, and the content features of each level correspond to the texture features of each level one by one.

The content features may be extracted by a content feature extraction module, which may be a conventional feature extraction module, or may be a content attention try learning (Content Attention Map learning, CAML) module based on a channel attention mechanism and a spatial attention mechanism.

The texture features may be extracted by a texture feature extraction module. The texture feature extraction module may be a conventional feature extraction module or a texture attention map learning (Texture Attention Map learning, TAML) module based on a channel attention mechanism and a spatial attention mechanism.

The CAML module and the TAML module may be collectively referred to as a convolutional block attention module (Convolutional Block Attention Module, CBAM) based on a channel attention mechanism and a spatial attention mechanism.

The content feature extraction module and the texture feature extraction module can be introduced to realize independent mining of the content features and the texture features in the face image to be detected. Here, the content features and the texture features are features in two feature domains of the face image to be detected, and are used for characterizing the fake evidence of the two feature domains, respectively.

And then, executing step S2, respectively fusing the content features and the texture features to obtain a content domain fusion result and a texture domain fusion result, namely fusing the content features to obtain a content domain fusion result, and fusing the texture features to obtain a texture domain fusion result.

The above two fusion processes are implemented in two independent feature domains, namely a content domain and a texture domain, and can be implemented through a content feature interaction module and a texture feature interaction module respectively, and the fusion mode can be directly implemented in a self-convolution mode or can be implemented by introducing a feature interaction vector, which is not particularly limited herein.

The content feature interaction module and the texture feature interaction module may be implemented as an Attention mechanism based feature interaction module (Intra-domain Attention-based Feature Interaction, intra afi).

And then, fusing the content domain fusion result and the texture domain fusion result to obtain a characteristic cross-domain fusion result. The fusion process is realized by crossing the feature domains, and can be realized by a cross-domain feature interaction module so as to grasp the high-order semantic relation between the features of the content domain and the texture domain. The obtained feature cross-domain fusion result is the correlation interaction feature between the two feature domains, namely the content domain and the texture domain, and is used for representing the fake evidence of the cross-feature domain.

Because the graph convolution neural network has natural modeling advantages on correlation inference of multi-source heterogeneous data, a cross-domain feature interaction module can be constructed based on the graph convolution neural network, high-order correlations between content features and texture features are mined, high-order semantic relations between the content features and the texture features are explored, and discriminative fake relation learning is performed. That is, the Cross-domain feature interaction module may be a Cross-domain feature interaction module (Cross-domain Graph-based Feature Interaction, cross gfi) based on Graph convolution to implement Cross-domain fusion of features through a Graph volume integration algorithm.

When the content features and the texture features are only provided with one hierarchy, the content domain fusion result and the texture domain fusion result are directly fused; when the content features and the texture features comprise a plurality of levels, fusing the content domain fusion results and the texture domain fusion results of the same level.

The feature cross-domain fusion result may include a feature cross-domain fusion result of the content domain and a feature cross-domain fusion result of the texture domain, where the feature cross-domain fusion result of the content domain refers to a fusion result obtained after the texture domain fusion result is fused to the content domain fusion result, and the feature cross-domain fusion result of the texture domain refers to a fusion result obtained after the content domain fusion result is fused to the texture domain fusion result.

And finally, executing step S3, and determining the fake detection result of the face image to be detected by utilizing the characteristic cross-domain fusion result. The process can be realized through the fake detection module, namely the characteristic cross-domain fusion result can be input to the fake detection module, and the fake detection module utilizes the characteristic cross-domain fusion result to carry out real face and fake face classification on the face image to be detected, so that the final fake detection result is obtained.

The fake detection module may include a Classifier (Classifier) to perform real and fake classification on the face image to be detected and output a fake detection result. The false detection result may include that the face image to be detected is a real face (i.e., true), and the face image to be detected is a false face (i.e., false).

In the embodiment of the invention, the steps S1-S3 can be realized by means of a human face fake detection model, wherein the human face fake detection model comprises a feature extraction module, a feature fusion module and a fake detection module, the feature extraction module comprises a content feature extraction module and a texture feature extraction module, and the feature fusion module comprises a content feature interaction module, a texture feature interaction module and a cross-domain feature interaction module.

The face counterfeiting detection method provided by the embodiment of the invention comprises the steps of firstly, acquiring a face image to be detected, and extracting the image characteristics of the face image to be detected, wherein the image characteristics comprise content characteristics and texture characteristics; then fusing the content features and the texture features to obtain a content domain fusion result and a texture domain fusion result, and fusing the content domain fusion result and the texture domain fusion result to obtain a feature cross-domain fusion result; and finally, determining a fake detection result of the face image to be detected based on the feature cross-domain fusion result. According to the method, the false evidence of the face in the face image to be detected is represented from multiple dimensions by extracting the content features and the texture features, so that the accuracy of the false detection result can be improved. In addition, the method can further improve the accuracy and reliability of the falsification detection result through the intra-feature fusion and the inter-feature fusion, has good generalization for multiple data sets, multiple falsification types and multiple falsification modes, has good robustness for high noise, strong compression and other visual disturbance in a real scene, can be effectively deployed to terminal equipment, and can be used for falsification detection of media data or real-time face images on the Internet.

On the basis of the above embodiment, the face counterfeiting detection method provided in the embodiment of the present invention, the image features include a plurality of levels;

Specifically, the image features include a plurality of levels, that is, the content features and the texture features each include a plurality of levels, and feature granularities of the respective levels are different, for example, three levels or more may be included, which is not particularly limited herein.

Furthermore, the content feature extraction module and the texture feature extraction module are both hierarchical feature extraction modules, and the content feature fusion module is a hierarchical feature fusion module.

The content feature extraction module may include a content feature extraction module of each level, which is configured to extract content features of each level, where the content features of each level may be represented by a content feature map. The texture feature extraction module may include a texture feature extraction module of each level, which is configured to extract texture features of each level, where the texture features of each level may be represented by a texture feature map.

With hierarchical semantic embedding, rich content information and texture information with different receptive fields can be obtained, and counterfeit evidence based on content perception and texture perception can be enhanced hierarchically.

The content feature interaction module can comprise content feature interaction modules of all levels, which are respectively used for fusing the content features of all levels to obtain a content domain fusion result of all levels; the texture feature interaction module can comprise texture feature interaction modules of all levels, and the texture feature interaction modules are used for respectively fusing texture features of all levels to obtain texture domain fusion results of all levels; the cross-domain feature interaction module may include a cross-domain feature interaction module of each level, which is configured to fuse a content domain fusion result and a texture domain fusion result of each level, respectively, so as to obtain a feature cross-domain fusion result of each level.

It can be seen that when the content feature and the texture feature each include multiple levels, the CAML module and the TAML module in the embodiments of the present invention may be collectively referred to as a level attention try learning (Hierarchical Attention Map Learning, HAML) module. The face fake detection model is a hierarchical network framework of content-texture correlation learning (HCTER), so that generalization and robustness of the face detection method can be greatly improved.

At this time, when the content features and the texture features are respectively fused to obtain a content domain fusion result and a texture domain fusion result, and the content domain fusion result and the texture domain fusion result are fused to obtain a feature cross-domain fusion result, the content features of each level are fused by using the content feature interaction module of each level to obtain a content domain fusion result of each level, and the texture features of each level are fused by using the texture feature interaction module of each level to obtain a texture domain fusion result of each level. And then, fusing the content domain fusion result and the texture domain fusion result of each level by utilizing the cross-domain feature interaction module of each level to obtain the feature cross-domain fusion result of each level.

In the embodiment of the invention, the image features comprise a plurality of layers, so that the generalization and the robustness of the face detection method are greatly improved. Moreover, the method is also applicable to the field of the present invention. The image feature fusion is carried out in the hierarchical layers, the feature intra-domain fusion and the feature cross-domain fusion are carried out in each hierarchical layer, so that the fine fusion of the content features and the texture features can be realized, the fusion precision is improved, and the accuracy of the counterfeit detection result is improved.

On the basis of the above embodiment, the texture features are extracted based on a difference-by-difference differential convolution operator or on a difference-by-difference differential convolution operator and a center differential convolution operator.

Specifically, in the embodiment of the invention, the texture features of the face image to be detected are extracted by utilizing a difference-by-difference differential convolution (Successive Difference Convolution, SDC) operator or utilizing an SDC operator and a center differential convolution (Central Difference Convolution, CDC) operator. Due to the presence of the SDC operator, pixel level details of continuity can be captured to get evidence of potential discriminability of forgery.

The SDC operator and CDC operator may be collectively referred to herein as an expanded differential convolution operator (Extended Difference Convolution, EDC). Further, the texture features may be extracted by a texture feature extraction module integrated with an SDC operator or EDC operator. For example, the texture feature extraction module may be a TAML module integrated with SDC or EDC operators. The texture feature extraction module can mine out the texture features with fine granularity and robustness to the changed complex scene, is beneficial to capturing the intrinsic fake mode of the face image, and further can remarkably improve the generalization capability of the face fake detection method.

On the basis of the above embodiment, the face falsification detection method provided in the embodiment of the present invention, the CDC operator is expressed based on the following formula:

wherein f _CDC (F ^l ，k ¹ ) Representing the CDC operator, F ^l Representing texture features of level l, k ¹ Representing the central differential convolution kernel, z ₀ Represents k ¹ Current position of current receptive field, z _n Represents k ¹ Dividing z in the current receptive field of (2) ₀ Other than positions, n represents k ¹ Is a position index of the current receptive field of (c),

represents k ¹ Is the current receptive field set of positions, +.>

Is k ¹ I=n.

As shown in FIG. 2, which is a schematic diagram illustrating the operation of the CDC operator, in the embodiment of the present invention, texture features of 8 positions around the current position are selected to be respectively associated with the current positionCalculating difference value of texture features, and then combining with k ¹ And multiplying the parameters of the corresponding positions in the CDC operator to obtain a CDC characteristic diagram after the operation of the CDC operator.

Since the CDC operator may generate additional gradient level texture information, it is advantageous for deep forgery detection compared to normal convolution.

On the basis of the above embodiment, in the face falsification detection method provided in the embodiment of the present invention, the SDC operator is expressed based on the following formula:

wherein f _SDC (F ^l ，k ² ) Representing the SDC operator, F ^l Representing texture features of level l, k ² Representing a difference-by-difference convolution kernel, z _n And z _m All represent k ² N and m each represent k ² Is a position index of the current receptive field of (c),

represents k ² Is a set of current receptive field position pairs, +.>

Is k ² I=n.

The SDC operator can capture the texture details of the pixel level of continuity, better optimize the robust performance of the texture feature extraction module and obtain the fake evidence of potential discriminability.

As shown in FIG. 3, which is a schematic diagram illustrating the operation of the SDC operator, in the embodiment of the present invention, 8 texture features around the current position are selected, the difference value is calculated for the texture features of any two positions separated by one position, and then the difference value is calculated with k ² And multiplying the parameters of the corresponding positions in the SDC operator to obtain the SDC feature map after the operation of the SDC operator.

On the basis of the foregoing embodiment, the face counterfeiting detection method provided in the embodiment of the present invention, where the extracting the image features of the face image to be detected includes:

for any level other than the lowest level, determining image features for the any level based on image features of a level preceding the any level, feature cross-domain fusion results, and initial features of the any level.

Specifically, each level of content feature extraction module may be a CAML module, which may include several residual blocks, a batch regularization (Batch Normalization, BN) layer, and a nonlinear activation (ReLU, sigmond, etc.) layer.

For the first hierarchy, the inputs of the CAML module of the first hierarchy include the initial features F of the first hierarchy ^l Feature cross-domain fusion results from level 1-1. The CAML module of each level extracts the content features of each level in an iterative manner, and the expression form of the content features of the first level may be:

where β and σ represent the batch regularization layer and the nonlinear activation layer, respectively. C (C) ^l Representing the content features of the first hierarchy.

The result is fused for the feature cross-domain of the content domain of the first-1 hierarchy.

The initial features of each level may be extracted by a multi-level backbone network encoder.

In particular, since the lowest hierarchy has no previous hierarchy, for the lowest hierarchy, its content features are determined directly from the initial features of the lowest hierarchy of the face image to be detected.

The method comprises the following steps:

C ^l ＝σ(β(ResBlock(F ^l )))。

likewise, the texture feature extraction modules of each level may be TAML modules, which integrate EDC operators, i.e. SDC operators and CDC operators.

For the first level, the inputs of the TAML module of the first level include the initial features F of the first level ^l Feature cross-domain fusion results from level 1-1. The TAML module of each level extracts texture features of each level in an iterative manner, and the expression form of the texture features of the first level may be:

wherein T is ^l Representing texture features of the first level.

The result is fused for the feature cross-domain of the texture domain of the first-1 hierarchy.

In particular, since the lowest hierarchy has no previous hierarchy, for the lowest hierarchy, its texture features are determined directly from the initial features of the lowest hierarchy of the face image to be detected.

The method comprises the following steps:

T ¹ ＝f _CDC (F ¹ ，k ¹ )+f _SDC ；(F ¹ ，k ² )。

in the embodiment of the invention, the image features comprise a plurality of layers, and the feature extraction module of each layer can be utilized to deeply explore the content semantic information and the texture detail information in a mode of extracting the image features by layering, so that the fake evidence of each feature domain in the face image to be detected is grasped from coarse to fine.

On the basis of the foregoing embodiment, the face falsification detection method provided in the embodiment of the present invention determines, based on the feature cross-domain fusion result of the previous level of the any level and the initial feature of the any level, image features of the any level, including:

Specifically, in the embodiment of the present invention, the feature stacking result of each level includes a content feature stacking result and a texture feature stacking result, so when determining the image features of any level except the lowest level, for the content features of any level, the feature cross-domain fusion result of the content features of the previous level and the content domain may be stacked element by element to obtain the content feature stacking result of the previous level, and then the content feature stacking result of the previous level and the initial feature of any level are combined in the channel dimension to obtain the content features of any level.

Similarly, for the texture features of any level, the texture features of the previous level and the feature cross-domain fusion result of the texture domain can be overlapped element by element to obtain the texture feature overlapped result of the previous level, and then the texture feature overlapped result of the previous level and the initial feature of any level are combined (Concat) in the channel dimension to obtain the texture features of any level.

In the embodiment of the invention, when the image characteristics of any level are determined, the image characteristics of the previous level are introduced, so that the image characteristics of any level can be extracted by using more effective information, and the obtained image characteristics of any level are more accurate.

Based on the above embodiment, in the face falsification detection method provided in the embodiment of the present invention, the fusing of the content features of each level is performed to obtain a content domain fusion result of each level, and the fusing of the texture features of each level is performed to obtain a texture domain fusion result of each level, including:

Specifically, the feature interaction vector corresponding to each hierarchy may include a content feature interaction vector and a texture feature interaction vector.

The content feature interaction module of each level may be an IntraAFI of a content domain, and is configured to generate a content feature interaction vector corresponding to each level, and fuse content features of each level by using the content feature interaction vector corresponding to each level, so as to obtain a content domain fusion result of each level.

The texture feature interaction module of each level can be IntraAFI of a texture domain, and is used for generating texture feature interaction vectors corresponding to each level, and fusing the texture features of each level by utilizing the texture feature interaction vectors corresponding to each level to obtain a texture domain fusion result of each level.

The content feature interaction vector corresponding to each level comprises a one-dimensional content attention vector

And a two-dimensional content gating feature vector +.>

The texture feature interaction vector corresponding to each level comprises a one-dimensional texture attention vector +.>

And a two-dimensional texture gating feature vector +.>

Wherein M is _l Refer to the number of channels of the content feature of the first hierarchy, N _l Refer to the channel number of texture features of the first level, H _l Refers to the total height, W, of the content features and texture features of the first level _l Refers to the total width of the content features and texture features of the first level.

Content domain fusion results of level I

And texture field fusion result of the first hierarchy +.>

The determination can be made by the following formula:

wherein, the liquid crystal display device comprises a liquid crystal display device,

hadamard product representing element-wise multiplication, +.>

C ^l For the content feature of the first hierarchy, T ^l Is the texture feature of the first level.

Based on the above embodiment, the face falsification detection method provided in the embodiment of the present invention fuses the content domain fusion result and the texture domain fusion result of each level to obtain a feature cross-domain fusion result of each level, including:

Specifically, each level may correspond to a cross-domain feature interaction module, for example, may be a cross gfi, and then the cross gfi of each level first uses a bilinear pooling method to perform feature embedding on a content domain fusion result and a texture domain fusion result of each level, so as to obtain a content texture semantic relationship of each level. The content texture semantic relationships of each hierarchy may be represented by a content texture semantic relationship matrix.

Bilinear pooling can be expressed as:

is->

The component in the kth channel, the ith height, the jth width,/, is->

Is the vector of the kth row in the content texture semantic relation matrix of the first hierarchy.

All are put together

Stacked together to form the content texture semantic relation matrix +.>

The content texture semantic relation matrix has good interpretability: / >

The projection of the content domain into the texture domain can be considered, where the kth line represents the kth (k=1, 2,.. _l ) A texture dependent quantization basis function having M _l And maintaining the content characteristics to be fused. In order to fully explore the high-order semantic relationships between different quantized basis functions, a graph rolling network (Graph Convolutional Network, GCN) is used on a content texture semantic relationship matrix to conduct fake evidence reasoning, and meanwhile the content features to be fused contained in each texture-related quantized basis function are considered to be a node in the graph.

Thereafter, the content texture semantic relationship of each level is non-linearly normalized, and the process is as follows:

wherein A is ^l For the normalization of the first level, the result represents an affinity matrix,

and->

Representing the 1 x 1 convolution layers for dimension conversion, respectively, softmax is a nonlinear normalization function.

Thereafter, the results of normalization of the first hierarchy are subjected to similarity aggregation. Here, similarity aggregation may be performed by dense fully connected graph convolution neural network operators, namely:

wherein Y is ^l The result is aggregated for the similarity of the first level,

represents A ^l Similarity matrix obtained after regularization, < > >

Is a learnable graph convolution weight.

Finally, according to the correlation between the characteristics of the content domain and the texture domain, Y is calculated ^l The texture domain fusion result applied to the first level is refined through a cross-attention mechanism, and the characteristic cross-domain fusion result of the content domain of the first level is obtained

Similarly, Y is ^l The method is applied to the content domain fusion result of the first level, and the texture domain fusion result is refined through a cross-attention mechanism, so that the characteristic cross-domain fusion result of the texture domain of the first level is obtained

The intra afi and cross gai modules are interleaved at different levels, progressively enhancing the content-texture representation with semantic relationships.

The principle of the first hierarchical level of cross gfi is shown in fig. 4. Fusion of results to content domains of level l

And texture field fusion result of the first hierarchy +.>

Feature embedding is carried out to obtain content texture semantic relation +.>

Content texture semantic relation for the first level +.>

Nonlinear normalization is carried out, and the result A obtained by normalization of the first level is obtained ^l Performing similarity aggregation, and obtaining a result Y by using the similarity aggregation of the first level ^l Determining a feature cross-domain fusion result of a content domain of a first hierarchical level >

Texture domain feature cross-domain fusion result +.>

In the embodiment of the invention, the feature cross-domain fusion result of each level can be more accurate and the reliability of the feature cross-domain fusion result can be improved through the introduction of a bilinear pooling method, a nonlinear normalization operation and a similarity aggregation method.

On the basis of the foregoing embodiment, the face falsification detection method provided in the embodiment of the present invention determines a falsification detection result of the face image to be detected based on the feature cross-domain fusion result, including:

Specifically, in the embodiment of the invention, when the fake detection result of the face image to be detected is determined by utilizing the feature cross-domain fusion result, if the content features and the texture features both comprise a hierarchy, the fake detection result can be obtained by directly utilizing the feature cross-domain fusion result of the hierarchy and performing true and false classification on the face image to be detected.

If the content features and the texture features both comprise a plurality of levels, the highest level feature cross-domain fusion result is actually utilized to carry out true and false classification on the face image to be detected to obtain a fake detection result. At this time, the feature cross-domain fusion result of the highest-level content domain and the highest-level content feature are subjected to element-by-element superposition to obtain a highest-level content superposition result, and the feature cross-domain fusion result of the highest-level texture domain and the highest-level texture feature are subjected to element-by-element superposition to obtain a texture superposition result.

And then, carrying out true and false classification on the face image to be detected by utilizing the content superposition result and the texture superposition result of the highest level to obtain a fake detection result.

In the embodiment of the invention, the highest-level characteristic cross-domain fusion result is utilized to superimpose the content characteristics and the texture characteristics of the characteristic cross-domain fusion result, and the true and false two classification is carried out on the face image to be detected, so that the accuracy of the true and false two classification can be ensured, and the accuracy of the fake detection result is further improved.

Fig. 5 is a schematic structural diagram of a face counterfeit detection model according to an embodiment of the present invention, where in fig. 5, content features and texture features each include three levels, i.e., a Low level (Low-level), a middle level (Mid-level), and a High level (High-level), corresponding to coarse-grained features, medium-grained features, and fine-grained features, respectively.

Low-level IntraAFI, mid-level IntraAFI, and High-level IntraAFI all include IntraAFI for the content domain and IntraAFI for the texture domain.

The cross-domain feature interaction module comprises Low-level cross GFI, mid-level cross GFI and High-level cross GFI.

The backbone encoders of each hierarchy in fig. 5 include a backbone encoder of a content domain and a backbone encoder of a texture domain, and each backbone encoder of the content domain and the backbone encoder of the texture domain of each hierarchy share weights. The element-wise superposition operation is represented by the operator in fig. 5, and the merge operation is represented by node C.

In summary, the embodiment of the invention provides a method for detecting universal face forgery by content-texture correlation mining. The method performs fine-grained face forgery detection by content-texture attention seeking to learn. In addition, a network framework of hierarchical content-texture correlation learning (HCTER) is provided, so that generalization and robustness of a face detection scheme are greatly improved. The network framework of the HCTER is an end-to-end face depth counterfeiting detection scheme, and has good generalization for multiple data sets, multiple counterfeiting types and multiple tampering modes.

Meanwhile, the method has good robustness to visual disturbance such as high noise, strong compression and the like in a real scene, and can be effectively deployed to terminal equipment to perform fake detection of media data or real-time face images on the Internet. The network model first deep mines high-level semantic features of multi-level content texture in a coarse-to-fine level learning manner through a content-texture attention-seeking learning scheme (CTAML) model. Next, the network framework of HCTER explores the high-order semantic relevance of content-texture features through an efficient progressive multi-domain feature fusion module (PFMI), and performs inter-fusion of intra-domain-cross-domain features in a reasonable way. Finally, the method was fully experimentally verified and algorithmically evaluated on 6 common public academia. A large number of experiments prove that compared with the existing face fake detector, the network framework of the HCTER has the advantage that the detection accuracy is remarkably improved. Meanwhile, in cross-domain experimental evaluation and robustness analysis experiments, the network framework of the HCTER has excellent generalization performance and model robustness.

As shown in fig. 6, on the basis of the above embodiment, there is provided a face-forgery detection apparatus according to an embodiment of the present invention, including:

The feature extraction module 61 is configured to obtain a face image to be detected, and extract image features of the face image to be detected, where the image features include content features and texture features;

the feature fusion module 62 is configured to fuse the content feature and the texture feature to obtain a content domain fusion result and a texture domain fusion result, and fuse the content domain fusion result and the texture domain fusion result to obtain a feature cross-domain fusion result;

the falsification detection module 63 is configured to determine a falsification detection result of the face image to be detected based on the feature cross-domain fusion result.

On the basis of the above embodiment, the face-forgery detection device provided in the embodiment of the present invention, the image features include a plurality of levels;

correspondingly, the feature fusion module is specifically configured to:

On the basis of the foregoing embodiments, the face counterfeit detection device provided in the embodiments of the present invention, the feature extraction module is specifically configured to:

On the basis of the foregoing embodiments, the face counterfeit detection device provided in the embodiments of the present invention, the feature extraction module is further specifically configured to:

On the basis of the foregoing embodiments, the face counterfeit detection device provided in the embodiments of the present invention, the feature fusion module is further specifically configured to:

On the basis of the foregoing embodiments, the face fake detection device provided in the embodiment of the present invention, the fake detection module is specifically configured to:

On the basis of the above embodiment, the face counterfeiting detection device provided in the embodiment of the present invention is characterized in that the texture feature is extracted based on a difference-by-difference convolution operator or based on a difference-by-difference convolution operator and a center difference convolution operator.

On the basis of the above embodiment, the face falsification detection device provided in the embodiment of the present invention, the difference-by-difference convolution operator is expressed based on the following formula:

represents k ² Is a set of current receptive field position pairs, +.>

Is k ² I=n.

Specifically, the functions of each module in the face counterfeit detection device provided in the embodiment of the present invention are in one-to-one correspondence with the operation flows of each step in the above method embodiment, and the achieved effects are consistent.

Fig. 7 illustrates a physical schematic diagram of an electronic device, as shown in fig. 7, which may include: processor (Processor) 710, communication interface (Communications Interface) 720, memory (Memory) 730, and communication bus 740, wherein Processor 710, communication interface 720, memory 730 communicate with each other via communication bus 740. The processor 710 may invoke logic instructions in the memory 730 to perform the face-forgery detection method provided in the above embodiments, the method comprising: acquiring a face image to be detected, and extracting image features of the face image to be detected, wherein the image features comprise content features and texture features; respectively fusing the content features and the texture features to obtain a content domain fusion result and a texture domain fusion result, and fusing the content domain fusion result and the texture domain fusion result to obtain a feature cross-domain fusion result; and determining a fake detection result of the face image to be detected based on the feature cross-domain fusion result.

Further, the logic instructions in the memory 730 described above may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product, the computer program product including a computer program, the computer program being storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of executing the face falsification detection method provided in the above embodiments, the method comprising: acquiring a face image to be detected, and extracting image features of the face image to be detected, wherein the image features comprise content features and texture features; respectively fusing the content features and the texture features to obtain a content domain fusion result and a texture domain fusion result, and fusing the content domain fusion result and the texture domain fusion result to obtain a feature cross-domain fusion result; and determining a fake detection result of the face image to be detected based on the feature cross-domain fusion result.

In still another aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the face falsification detection method provided in the above embodiments, the method comprising: acquiring a face image to be detected, and extracting image features of the face image to be detected, wherein the image features comprise content features and texture features; respectively fusing the content features and the texture features to obtain a content domain fusion result and a texture domain fusion result, and fusing the content domain fusion result and the texture domain fusion result to obtain a feature cross-domain fusion result; and determining a fake detection result of the face image to be detected based on the feature cross-domain fusion result.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A face falsification detection method, comprising:

2. A face falsification detection method as defined in claim 1, wherein the image features comprise a plurality of levels;

3. The face falsification detection method as claimed in claim 2, wherein the extracting image features of the face image to be detected comprises:

4. A face falsification detection method as defined in claim 3, wherein the determining the image feature of the arbitrary hierarchy based on the feature cross-domain fusion result of the previous hierarchy of the arbitrary hierarchy and the initial feature of the arbitrary hierarchy comprises:

5. The face counterfeit detection method of claim 2, wherein the fusing the content features of each level to obtain a content domain fusion result of each level, and fusing the texture features of each level to obtain a texture domain fusion result of each level, comprises:

6. The face counterfeit detection method of claim 2, wherein the fusing the content domain fusion result and the texture domain fusion result of each level to obtain the feature cross-domain fusion result of each level includes:

7. The face falsification detection method of claim 2, wherein the determining the falsification detection result of the face image to be detected based on the feature cross-domain fusion result comprises:

8. The face falsification detection method of any one of claims 1 to 7, wherein the texture features are extracted based on a difference-by-difference differential convolution operator or on a difference-by-difference differential convolution operator and a center differential convolution operator.

9. The face falsification detection method of claim 8, wherein the difference-by-difference differential convolution operator is expressed based on the following formula:

represents k ² Is a set of current receptive field position pairs, +.>

Is k ² I=n.

10. A face-forgery detection apparatus, characterized by comprising: