CN113673465A

CN113673465A - Image detection method, device, equipment and readable storage medium

Info

Publication number: CN113673465A
Application number: CN202110995202.4A
Authority: CN
Inventors: 周欣; 王娜; 李连磊; 白云波; 程岩; 王立松
Original assignee: China Information Technology Security Evaluation Center
Current assignee: China Information Technology Security Evaluation Center
Priority date: 2021-08-27
Filing date: 2021-08-27
Publication date: 2021-11-19

Abstract

The embodiment of the application provides an image detection method, an image detection device and a readable storage medium, wherein a face image sequence is obtained, the face image sequence comprises at least one face image arranged according to a time sequence, a one-dimensional frequency domain feature vector of each face image is obtained, the one-dimensional frequency domain feature vectors of each face image are spliced according to the time sequence to obtain a frequency domain feature matrix, an image detection result of the face image sequence is obtained according to the frequency domain feature matrix, and the image detection result indicates whether the face image sequence is tampered. Because the frequency domain characteristic vector of the target face image, namely any one-dimensional frequency domain characteristic vector of the face image, represents the frequency domain characteristic of the target face image, the frequency domain characteristic and the time domain characteristic of a face image sequence are integrated through a frequency domain characteristic matrix formed by splicing a plurality of continuous one-dimensional frequency domain characteristic vectors of the face image, and therefore the accuracy of a detection result can be improved by taking the frequency domain characteristic matrix as an image detection basis.

Description

Image detection method, device, equipment and readable storage medium

Technical Field

The present application relates to the field of image recognition technologies, and in particular, to an image detection method, an image detection device, an image detection apparatus, and a readable storage medium.

Background

Face forgery refers to an image manipulation method by replacing the biological features of the face in a video or an image and keeping the original action or expression. The human face features are important biological identification features in traditional social interaction, and become identification features for people using virtual assets. Therefore, the forged images or videos seriously threaten public information safety and interfere with social stability.

The existing image detection method starts from spatial and temporal spatial characteristics, and detects videos or images by taking images as units so as to judge the authenticity of human faces.

Disclosure of Invention

The application provides an image detection method, an image detection device and a readable storage medium, and aims to improve the image detection accuracy, and the method comprises the following steps:

an image detection method, comprising:

acquiring a face image sequence, wherein the face image sequence comprises at least one face image arranged according to a time sequence;

acquiring a one-dimensional frequency domain feature vector of each face image, wherein the one-dimensional frequency domain feature vector of a target face image represents the frequency domain feature of the target face image, and the target face image is any one of the face images;

splicing the one-dimensional frequency domain feature vectors of the face images according to the time sequence to obtain a frequency domain feature matrix;

and acquiring an image detection result of the face image sequence according to the frequency domain characteristic matrix, wherein the image detection result indicates whether the face image sequence is tampered.

Optionally, the obtaining a one-dimensional frequency domain feature vector of the target face image includes:

acquiring frequency domain parameters of each frequency domain signal of the target face image, wherein the frequency domain parameters comprise frequency and amplitude;

acquiring a plurality of frequency domain signal sets, wherein each frequency domain signal set comprises a plurality of same-frequency signals, and the frequencies of the plurality of same-frequency signals are the same;

carrying out preset value processing on the amplitudes of the multiple same-frequency signals in each frequency domain signal set to obtain an amplitude parameter of each frequency domain signal set, wherein the preset value processing comprises averaging and normalization;

arranging the amplitude parameters of each frequency domain signal set according to a frequency domain sequence to obtain a one-dimensional frequency domain feature vector of the target face image; the frequency domain sequence bits are obtained by sequencing the frequencies of the frequency domain signal sets from large to small.

Optionally, the obtaining the frequency domain parameter of each frequency domain signal of the target face image includes:

performing preset space-frequency transformation on the target face image to obtain a frequency spectrum image of the target face image and a spatial domain corresponding relation, wherein the spatial domain corresponding relation comprises the corresponding relation between each first pixel point and each frequency domain signal, the first pixel points are pixel points in the frequency spectrum image, and the preset space-frequency transformation comprises Fourier transformation;

and acquiring a frequency domain parameter of each frequency domain signal according to the space domain parameter of each first pixel point in the frequency spectrum image and the space domain corresponding relation, wherein the space domain parameter comprises a coordinate value and a gray value.

Optionally, the pre-set space-frequency transformation further comprises frequency centering and/or high-pass filtering.

Optionally, obtaining an image detection result of the face image sequence according to the frequency domain feature matrix, including:

inputting the frequency domain characteristic matrix into a preset prediction model to obtain a prediction result of the prediction model as an image detection result;

the prediction model is obtained by taking a frequency domain characteristic matrix of a preset sample image sequence as input data and a label of the preset sample image sequence as a target output training through a neural network model, wherein the label indicates whether the preset sample image sequence is tampered.

Optionally, the neural network model includes a plurality of cyclic network modules connected end to end in sequence, each cyclic network module includes a cyclic network layer and a first adder, the cyclic network layer is configured to output a serialization feature of input data of the cyclic network layer, and the first adder is configured to perform addition operation on the input data and the output data of the cyclic network layer, and output an addition operation result as output data of the cyclic network module.

Optionally, the loop network layer includes a plurality of memory units connected end to end in sequence, each memory unit includes a long-short term memory network, a ReLU layer, and a second adder, the ReLU layer is configured to perform a ReLU function operation on output data of the long-short term memory network, and the second adder is configured to perform an addition operation on input data of the long-short term memory network and output data of the ReLU, and output an addition operation result as output data of the memory unit.

An image detection apparatus comprising:

the image acquisition unit is used for acquiring a face image sequence, and the face image sequence comprises at least one face image arranged according to a time sequence;

the vector acquisition unit is used for acquiring a one-dimensional frequency domain feature vector of each face image, wherein the one-dimensional frequency domain feature vector of a target face image represents the frequency domain feature of the target face image, and the target face image is any one of the face images;

the feature matrix acquisition unit is used for splicing the one-dimensional frequency domain feature vectors of the face images according to the time sequence to obtain a frequency domain feature matrix;

and the detection result acquisition unit is used for acquiring an image detection result of the face image sequence according to the frequency domain characteristic matrix, and the image detection result indicates whether the face image sequence is tampered.

An image detection apparatus comprising: a memory and a processor;

the memory is used for storing programs;

the processor is used for executing the program and realizing each step of the image detection method.

A readable storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of the image detection method.

According to the technical scheme, the image detection method, the device, the equipment and the readable storage medium provided by the embodiment of the application obtain a face image sequence, the face image sequence comprises at least one face image arranged according to a time sequence, a one-dimensional frequency domain feature vector of each face image is obtained, the one-dimensional frequency domain feature vectors of each face image are spliced according to the time sequence to obtain a frequency domain feature matrix, an image detection result of the face image sequence is obtained according to the frequency domain feature matrix, and the image detection result indicates whether the face image sequence is tampered. Because the frequency domain characteristic vector of the target face image, namely any one-dimensional frequency domain characteristic vector of the face image, represents the frequency domain characteristic of the target face image, the frequency domain characteristic and the time domain characteristic of a face image sequence are integrated through a frequency domain characteristic matrix formed by splicing a plurality of continuous one-dimensional frequency domain characteristic vectors of the face image, and therefore the accuracy of a detection result can be improved by taking the frequency domain characteristic matrix as an image detection basis.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flowchart of an embodiment of an image detection method according to an embodiment of the present disclosure;

FIG. 2a illustrates a visualization flow of a method of acquiring a centered spectral image;

FIG. 2b is a diagram of a pixel set according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of a prediction model according to an embodiment of the present disclosure;

fig. 4 is a schematic flowchart of an image detection method according to an embodiment of the present application;

fig. 5 is a schematic flowchart of a method for acquiring a one-dimensional frequency domain feature vector according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an image detection apparatus according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of an image detection apparatus according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

With the development and application of deep learning, the performance and effect of the face counterfeiting technology are remarkably improved, and especially, the application and development of the human body image synthesis technology (collectively referred to as deep fake technology) based on artificial intelligence bring huge information safety hidden dangers to the society. The embodiment of the application provides an image detection method based on frequency domain spatial features, which is used for detecting whether an image sequence (specifically a human face image sequence) in an image or a video is falsified or not so as to determine whether a human face in the image is falsified or not so as to improve the accuracy of image detection and reduce the requirements of an image detection algorithm on operation resources.

Fig. 1 is a flowchart of a specific implementation method of an image detection method provided in an embodiment of the present application, and as shown in fig. 1, the method includes:

s101, extracting a face image sequence from a video to be detected.

In this embodiment, the face image sequence includes at least one face image sorted according to a time sequence, where the time sequence is a time sequence of the face image in the video to be detected.

It should be noted that the number of face images may be preset according to video attributes (definition, size, etc.) or actual application scenes, and a suitable number of face images improves detection accuracy on one hand and can save computing resources on the other hand, and preferably, the face image sequence includes 50 face images sorted according to a time sequence. The specific method for extracting the human face image sequence from the video to be detected comprises a plurality of methods, wherein one optional method comprises A1-A3.

A1, dividing the video to be detected into a plurality of continuous video frames.

A2, using MTCNN (Multi-taskfrontalNeurrane) to extract a face image satisfying a preset requirement from each video frame. Wherein the preset requirements include: the image shape is a preset shape, and/or the face area is located in a preset area of the face image, for example, each face image is a square image with a preset size, and the face area is located in the middle area of the face image.

And A3, generating a face image sequence according to the time sequence of the face images.

And S102, acquiring a gray level image of each face image to obtain a gray level image sequence.

S103, Fourier transform is carried out on each gray level image to obtain a first frequency spectrum image of each gray level image.

In this embodiment, a fourier transform formula is used to obtain a corresponding relationship between a target frequency domain signal and a target pixel point, which is recorded as a first space-frequency corresponding relationship, and further, a first spectrum image is generated by using the first space-frequency corresponding relationship. The target frequency domain signal is a frequency domain signal extracted from the gray level image, and the target pixel points are pixel points in the frequency spectrum image.

In this embodiment, the first space-frequency correspondence specifically includes: and the corresponding relation between the frequency domain parameters of the target frequency domain signal and the space domain parameters of the target pixel point. The frequency domain parameters of the frequency domain signal include frequency and amplitude, and the spatial domain parameters include position (specifically, coordinate values) and gray values. Specifically, the gray value of the first target pixel point (any one pixel point) is equal to the amplitude of the first target frequency domain signal (the frequency domain signal corresponding to the first target pixel point), the frequency of the first target frequency domain signal is equal to the edge distance of the first target pixel point, the edge distance of the first target pixel point is obtained according to the coordinate value of the first target pixel point, in this embodiment, the edge distance of any pixel point is the distance from the pixel point to the target point, and the target point is the vertex closest to the pixel point in the spectrum image.

In a grey scale image T of a sequence of grey scale images_i(i∈[1,N]N is the number of gray images) as an example, T_iThe size of the target frequency domain signal s (u, v) is X multiplied by Y (taking a pixel as a measurement unit), and the corresponding relation between the frequency domain parameter of the target frequency domain signal s (u, v) and the spatial domain parameter of the target pixel point d (u, v) corresponding to the frequency domain parameter is obtained by utilizing a Fourier transform formula (see formula 1), so that the frequency spectrum image P is generated_iThe following are:

in the formula (1), the value of F (u, v) is the amplitude of the target frequency domain signal s (u, v), that is, the gray value of the target pixel d (u, v), wherein the coordinate value of d (u, v) is (u, v), and the frequency of s (u, v) is equal to the edge distance of d (u, v). f (x, y) is the gray value of the pixel point with the coordinate value of (x, y) in the gray image.

And S104, performing frequency centering on each frequency spectrum image to obtain a centered frequency spectrum image of each gray level image.

In particular, the specific process of frequency centering can be seen in the prior art.

It should be further noted that fig. 2a illustrates a visualization process of a method for acquiring a centered spectrum image, where 201 is a grayscale image T_iAnd 202 is T_iSpectral image P of_iAnd 203 is T_iIs a centered spectral image CP_i。

And S105, acquiring M pixel sets according to the centralized spectrum image.

In this embodiment, each pixel set includes a plurality of pixel points having the same center distance, where the center distance refers to a distance between a pixel point in the centered spectrum image and a center point.

For example, the centered spectral image CP_iIs D_j＝{d₁、d₂、....、d_K}(j∈[1,M]K is D_jNumber of middle pixels), wherein d_k(k∈[1,K]) For the pixel point with the center distance r (preset radius value) in the centered spectrum image, as shown in FIG. 2b, d_kThe method comprises the steps of centering all pixel points on a circle Or with a center point as a circle center and r as a radius in the frequency spectrum image.

S106, acquiring a gray average value of the pixel set, and normalizing the gray average value to a preset value interval to obtain a one-dimensional frequency domain characteristic parameter of the pixel set.

In this embodiment, the average gray value of the pixel set is an average gray value of all the pixels in the pixel set, and the preset value interval is [0,1 ]]In any pixel set as D_j＝{d₁、d₂、....、d_KFor example, an alternative method for obtaining a one-dimensional frequency domain characteristic parameter is shown in the following formula 2:

in formula 2, α_jRepresents D_jOne-dimensional frequency domain characteristic parameter of (d), h (d)_k) Representing a pixel point d_kGray value of h_maxRepresents D_jThe maximum gray value of the middle pixel point.

And S107, arranging the one-dimensional frequency domain characteristic parameters according to the frequency domain sequence to obtain one-dimensional frequency domain characteristic vectors.

In this embodiment, the order bits obtained by sorting the center distances of all the pixel sets from large to small are used as the frequency domain order bits of the one-dimensional frequency domain characteristic parameters of the pixel sets. The central distance of the pixel set is the central distance of any pixel point in the pixel set. Following the example, grayscale image T_iHas a one-dimensional frequency domain feature vector of A_i＝[α₁、α₂、...、α_j、...、α_M]。

It should be noted that, the center distance of each pixel point in the centralized spectrum image is equal to the frequency of the spectrum signal corresponding to the pixel point, and the gray value is equal to the amplitude of the spectrum signal corresponding to the pixel point, so in the one-dimensional feature vector of the gray image, each one-dimensional frequency domain feature parameter is the normalization result of the average value of the amplitudes of the spectrum signals with the same frequency. As can be seen, the one-dimensional frequency domain feature vector is a dimension reduction result of the two-dimensional frequency domain feature of the grayscale image.

Obviously, compared with the two-dimensional frequency domain characteristic of the traditional face image, the one-dimensional frequency domain characteristic vector of the face image compresses the data volume, reduces the operation overhead and reduces the requirement on operation resources while keeping the frequency domain characteristic. Further, the method obtains the centralized frequency spectrum image of each face image through gray level transformation, Fourier transformation and frequency centralization, and obtains a one-dimensional frequency domain feature vector capable of representing the face image according to the corresponding relation between the frequency domain feature and the space domain parameter of each pixel point in the centralized frequency spectrum image.

And S108, splicing the one-dimensional frequency domain characteristic vectors of the gray level images according to the time sequence of the gray level images to obtain a frequency domain characteristic matrix.

It can be understood that the one-dimensional frequency domain feature vectors in the frequency domain feature matrix represent frequency domain features of a grayscale image, and the ordering of each one-dimensional frequency domain feature vector indicates a time sequence of the grayscale image, so that the frequency domain feature matrix represents both the frequency domain features and the time domain features of the video to be detected.

Taking the above example, X₁～X_NAnd splicing to obtain a frequency domain characteristic matrix Q, wherein the size of Q is NxM. It should be noted that specific splicing methods are described in the prior art.

And S109, inputting the frequency domain characteristic matrix into a preset prediction model to obtain a prediction result of the prediction model as an image detection result.

In this embodiment, the prediction result indicates that the face image sequence is tampered. The prediction model is obtained by a neural network model by taking a frequency domain characteristic matrix of a preset sample image sequence as input data and taking a label of the preset sample image sequence as a target output training, wherein the label indicates whether the face in the preset sample image sequence belongs to a forged face or not, namely whether the preset sample image sequence is tampered or not.

It should be noted that the specific training method and the specific structure of the prediction model are referred to in the following embodiments.

It can be seen from the above technical solutions that, in the image detection method provided in this embodiment of the present application, a face image sequence is obtained, where the face image sequence includes at least one face image arranged according to a time sequence, a one-dimensional frequency domain feature vector of each face image is obtained, a frequency domain feature matrix is obtained by splicing the one-dimensional frequency domain feature vectors of the face images according to the time sequence, the frequency domain feature matrix is input to a preset prediction model to obtain a prediction result of the prediction model, and as an image detection result, since a frequency domain feature of a target face image, that is, a one-dimensional frequency domain feature vector of any one face image, represents a frequency domain feature of the target face image, a frequency domain feature matrix obtained by splicing a plurality of continuous one-dimensional frequency domain feature vectors of the face images integrates frequency domain features and time domain features of the face image sequence, and therefore the frequency domain feature matrix is used as an image detection basis, the accuracy of the detection result can be improved.

Furthermore, the method improves the efficiency and accuracy of image detection by means of strong data processing capability and optimizing capability of the neural network model.

Fig. 3 illustrates a specific structure of a prediction model provided in an embodiment of the present application.

As shown in fig. 3, the prediction model includes a plurality of cyclic network modules and two fully connected layers, which are connected end to end in sequence, and as shown in fig. 3, input data of a first cyclic network module E1 is input data Q of the prediction model, and input data of other cyclic network modules (any cyclic network module except the first cyclic network module) is output data of a previous cyclic network module, for example, input data of a cyclic network module E2 is output data Q1 of E1. The input data of the first full connection layer is the output data of the second circulation network module (the last circulation network module ER), the input data of the second full connection layer is the output data of the first full connection layer, and the output data of the second full connection layer is the output data of the prediction model, namely the prediction result P. It should be noted that P includes 0 or 1, where P ═ 0 indicates that the face in the video to be detected is forged, and P ═ 1 indicates that the face in the video to be detected is not forged.

In this embodiment, any one of the cyclic network modules includes a cyclic network layer and an adder, where input data of the cyclic network layer is input data of the cyclic network module, the cyclic network layer is configured to perform multiple preset matrix operations on the input data and output serialized features of the input data, and the adder is configured to perform addition operations on the input data and the output data of the cyclic network layer to obtain an addition operation result and output the addition operation result as output data of the cyclic network module. As shown in fig. 3, E1 includes a loop network layer L1 and an adder J1, where input data of L1 is input data Q of E1, and the adder J1 is used to add Q and output data of L1 to obtain an addition result Q1 and output it.

Each of the cyclic network layers includes a plurality of memory units connected end to end in sequence, as shown in fig. 3, the 1+5a (a is 0,1, 2, 3, and 4) cyclic network layers include 3 memory units, and the other cyclic network layers include 5 memory units, for example, E1 includes 3 memory units, and E2 to E4 include 5 memory units.

The input data of the first memory unit in each cycle network layer is the input data of the cycle network layer, and the input data of other memory units (any memory unit except the first memory unit) is the output data of the last memory unit. The input data of the memory cell M2 shown in FIG. 3 is the output data of the memory cell M1. Specifically, each memory unit includes an LSTM (Long short-term memory) network, a ReLU layer, and an adder, where input data of the LSTM is input data of the memory unit, the ReLU layer is configured to perform a ReLU function operation on output data of the LSTM, and the adder is configured to perform an addition operation on output data of the ReLU layer and input data of the LSTM, and output an addition operation result as output data of the memory unit. As shown in fig. 3, M1 includes LSTMW1, ReLU layer R1, and adder (not shown in fig. 3), where R1 is used to perform ReLU function operation on output data of M1, and the adder is used to perform addition operation on input data of W1 and output data of R1 to obtain output data M1 of M1, which is input to M2.

The Relu function is referred to the following equation 3:

F(L(q))＝max(0,L(q)) (3)

in formula 3, F (l (q)) is output data of the ReLU layer, l (q) is output data of the LSTM, and q is input data of the LSTM, that is, input data of the memory cell.

It should be noted that the specific structure and function of each LSTM, ReLU layer, adder can be seen in the prior art.

It should be further noted that, during the model training process, gradient descent is used for parameter optimization, and the parameter setting of gradient descent includes: the learning rate is 0.00005, the momentum is 0.5, and the initial accuracy threshold is set to 0.5.

The training end conditions include: the accuracy of the prediction model is higher than a preset accuracy threshold value, and/or the iteration number reaches a preset number threshold value.

In this embodiment, a prediction model with the highest accuracy is selected from prediction models obtained through multiple training as a final prediction model.

It can be seen from the foregoing technical solutions that the neural network provided in the embodiment of the present application includes a plurality of cyclic network modules connected end to end in sequence, each cyclic network module includes a cyclic network layer and a first adder, the cyclic network layer includes a plurality of memory units connected end to end in sequence, and each memory unit includes a long-term and short-term memory network, a ReLU layer, and a second adder.

The first adder is used for performing addition operation on the input data and the output data of the circulating network layer and outputting an addition operation result as output data of the circulating network module. The ReLU layer is used for carrying out Relu function operation on output data of the long and short term memory network, and the second adder is used for carrying out addition operation on input data of the long and short term memory network and output data of the ReLU and outputting an addition operation result as output data of the memory unit. Therefore, the prediction model can improve the detection performance of the neural network model by combining the network characteristics of the LSTM when processing the sequence data and the network characteristics of the linear residual error structure, thereby improving the accuracy of the prediction model for outputting the prediction result.

It should be noted that fig. 3 is only a specific optional structure and a training method of the prediction model provided in the embodiment of the present application, and the prediction model further includes other specific optional structures and training methods, which are not described in detail in this embodiment.

It should be noted that the flow shown in fig. 1 only illustrates an optional specific implementation of the image detection method provided in the embodiment of the present application, and the present application may also be implemented in other optional specific implementations.

For example, after S104, the method further includes performing a filtering operation on the centered spectral image, specifically, filtering frequency signals of a preset region in the centered spectral image to obtain a filtered spectrogram, and performing S105 to S109 by replacing the centered spectral image with the filtered spectrogram. The specific filtering method is to perform point multiplication on a frequency domain characteristic matrix of the centralized frequency spectrum image and a frequency domain characteristic matrix of a preset mask, wherein the numerical value of a preset region in the frequency domain characteristic matrix of the preset mask is 0, so that the filtering of frequency signals of the preset region is realized, and obviously, a non-key region in a frequency domain space can be filtered by utilizing high-pass filtering, so that the detection accuracy is improved.

For example, S103 to S106 are only one optional method for obtaining the one-dimensional frequency domain characteristic parameters, and the present application also includes other methods for obtaining the one-dimensional frequency domain characteristic parameters. For another example, S109 is only an optional specific implementation method for obtaining an image detection result according to a frequency domain feature matrix, and the present application also includes other specific implementation methods, and fig. 3 is only a specific structure of an optional prediction model, and the prediction model provided by the present application may also include other specific structures.

In summary, an image detection method provided in the embodiment of the present application is summarized as a flowchart shown in fig. 4, and as shown in fig. 4, the method includes:

s401, obtaining a face image sequence.

In this embodiment, the face image sequence includes at least one face image arranged according to a time sequence, where the face image is extracted from an image frame of the video to be detected, and the time sequence of the face image is determined according to a time sequence of the image frame in the video to be detected.

It should be noted that, referring to S101, an optional method for acquiring a face image sequence is not described in detail in this embodiment.

S402, acquiring a one-dimensional frequency domain feature vector of each face image.

In this embodiment, the one-dimensional frequency domain feature vector of the target face image represents the frequency domain feature of the target face image, the target face image is any one of the face images, it should be noted that the one-dimensional frequency domain feature vector is obtained by reducing the dimension of the two-dimensional frequency domain feature of the face image, and the two-dimensional frequency domain feature represents the frequency distribution and the amplitude distribution of the frequency domain signal of the face image.

In this embodiment, the obtaining of the one-dimensional frequency domain feature vector of each face image includes multiple types, and one optional method is as described in the above embodiments, and fig. 5 illustrates another optional method for obtaining the one-dimensional frequency domain feature vector, including:

s501, obtaining frequency domain parameters of each frequency domain signal of the target face image, wherein the frequency domain parameters comprise frequency and amplitude.

Specifically, a frequency spectrum image of the target face image and a spatial domain corresponding relation are obtained by performing preset space-frequency transformation on the target face image, and a frequency domain parameter of each frequency domain signal is obtained according to a spatial domain parameter of each first pixel point in the frequency spectrum image and the spatial domain corresponding relation.

The spatial domain corresponding relation comprises corresponding relations between each first pixel point and each frequency domain signal, the first pixel points are pixel points in the frequency spectrum image, and the spatial domain parameters comprise coordinate values and gray values. It should be noted that the predetermined space-frequency transform at least includes a fourier transform, and may further include a frequency centering and/or a high-pass filtering operation. An alternative frequency domain parameter method for acquiring each frequency domain signal of the target face image may be as described in the above embodiments.

S502, obtaining a plurality of frequency domain signal sets, wherein each frequency domain signal set comprises a plurality of same-frequency signals.

The frequencies of the multiple same-frequency signals are the same, and it should be noted that, through a spatial domain correspondence relationship, the frequency domain signal set corresponds to a pixel set, and the pixel set includes first pixel points corresponding to each frequency domain signal in the frequency domain signal set, that is, the pixel set of the centralized spectrum image in the above embodiment.

S503, carrying out preset value processing on the amplitudes of the multiple same-frequency signals in each frequency domain signal set to obtain an amplitude parameter of each frequency domain signal set.

The preset numerical value processing includes averaging and normalization, and it should be noted that averaging includes arithmetic mean acquisition or median acquisition.

S504, arranging the amplitude parameters of the frequency domain signal sets according to the frequency domain sequence to obtain a one-dimensional frequency domain feature vector of the target face image.

The frequency domain sequence bits are obtained by sequencing the frequencies of the frequency domain signal sets from large to small.

Compared with the two-dimensional frequency domain characteristic of the traditional face image, the one-dimensional frequency domain characteristic vector of the face image compresses the data volume, reduces the operation overhead and reduces the requirement on operation resources while keeping the frequency domain characteristic.

And S403, splicing the one-dimensional frequency domain feature vectors of the face images according to the time sequence to obtain a frequency domain feature matrix.

And S404, acquiring an image detection result of the face image sequence according to the frequency domain characteristic matrix.

In this embodiment, the image detection result indicates whether the face image sequence is tampered.

Optionally, a method for obtaining an image detection result of a face image sequence according to a frequency domain feature matrix includes: and inputting the frequency domain characteristic matrix into a preset prediction model to obtain a prediction result of the preset prediction model, wherein the prediction result is used as an image detection result and indicates that the human face image sequence is tampered, the prediction model is obtained by a neural network model, the frequency domain characteristic matrix of the preset sample image sequence is used as input data, a label of the preset sample image sequence is used as a target, and the label indicates whether the preset sample image sequence is tampered.

It should be noted that specific methods can be found in the above examples.

According to the technical scheme, the image detection method provided by the embodiment of the application obtains the face image sequence, the face image sequence comprises at least one face image arranged according to a time sequence, the one-dimensional frequency domain feature vector of each face image is obtained, the one-dimensional frequency domain feature vectors of the face images are spliced according to the time sequence to obtain a frequency domain feature matrix, the image detection result of the face image sequence is obtained according to the frequency domain feature matrix, and the image detection result indicates whether the face image sequence is tampered. Because the frequency domain characteristic vector of the target face image, namely any one-dimensional frequency domain characteristic vector of the face image, represents the frequency domain characteristic of the target face image, the frequency domain characteristic and the time domain characteristic of a face image sequence are integrated through a frequency domain characteristic matrix formed by splicing a plurality of continuous one-dimensional frequency domain characteristic vectors of the face image, and therefore the accuracy of a detection result can be improved by taking the frequency domain characteristic matrix as an image detection basis.

Fig. 5 is a schematic structural diagram of an image detection apparatus provided in an embodiment of the present application, and as shown in fig. 5, the apparatus may include:

an image detection apparatus, characterized by comprising:

an image obtaining unit 501, configured to obtain a face image sequence, where the face image sequence includes at least one face image arranged according to a time sequence;

a vector obtaining unit 502, configured to obtain a one-dimensional frequency domain feature vector of each face image, where the one-dimensional frequency domain feature vector of a target face image represents a frequency domain feature of the target face image, and the target face image is any one of the face images;

a feature matrix obtaining unit 503, configured to splice the one-dimensional frequency domain feature vectors of the face images according to the time sequence to obtain a frequency domain feature matrix;

a detection result obtaining unit 504, configured to obtain an image detection result of the face image sequence according to the frequency domain feature matrix, where the image detection result indicates whether the face image sequence is tampered.

Optionally, the vector obtaining unit is configured to obtain a one-dimensional frequency domain feature vector of the target face image, and includes: the vector acquisition unit is specifically configured to:

Optionally, the vector obtaining unit obtains a frequency domain parameter of each frequency domain signal of the target face image, including: the vector acquisition unit is specifically configured to:

acquiring a frequency domain parameter of each frequency domain signal according to a space domain parameter of each first pixel point in the frequency spectrum image and the space domain corresponding relation, wherein the space domain parameter comprises a coordinate value and a gray value;

Optionally, the detection result obtaining unit is configured to obtain an image detection result of the face image sequence according to the frequency domain feature matrix, and includes: the detection result acquisition unit is specifically configured to:

Fig. 6 shows a schematic structural diagram of the image detection apparatus, which may include: at least one processor 601, at least one communication interface 602, at least one memory 603, and at least one communication bus 604;

in the embodiment of the present application, the number of the processor 601, the communication interface 602, the memory 603, and the communication bus 604 is at least one, and the processor 601, the communication interface 602, and the memory 603 complete communication with each other through the communication bus 604;

the processor 601 may be a central processing unit CPU, or an application Specific Integrated circuit asic, or one or more Integrated circuits configured to implement embodiments of the present invention, or the like;

the memory 603 may include a high-speed RAM memory, and may further include a non-volatile memory (non-volatile memory), etc., such as at least one disk memory;

the memory stores a program, and the processor can execute the program stored in the memory to realize the steps of the image detection method provided by the embodiment of the application, as follows:

an image detection method, comprising:

An embodiment of the present application further provides a readable storage medium, where the readable storage medium may store a computer program adapted to be executed by a processor, and when the computer program is executed by the processor, the computer program implements the steps of an image detection method provided in the embodiment of the present application, as follows:

an image detection method, comprising:

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. An image detection method, comprising:

2. The method of claim 1, wherein obtaining a one-dimensional frequency domain feature vector of the target face image comprises:

3. The method according to claim 2, wherein the obtaining frequency domain parameters of each of the frequency domain signals of the target face image comprises:

4. The method according to claim 3, wherein the pre-defined space-frequency transformation further comprises frequency centering and/or high-pass filtering.

5. The method according to claim 1, wherein the obtaining the image detection result of the face image sequence according to the frequency domain feature matrix comprises:

6. The method of claim 5, wherein the neural network model comprises a plurality of cyclic network modules connected end to end in sequence, each cyclic network module comprises a cyclic network layer and a first adder, the cyclic network layer is used for outputting the serialized characteristics of the input data of the cyclic network layer, and the first adder is used for performing addition operation on the input data and the output data of the cyclic network layer and outputting the addition operation result as the output data of the cyclic network module.

7. The method of claim 6, wherein the cyclic network layer comprises a plurality of memory units connected end to end in sequence;

each memory unit comprises a long-short term memory network, a ReLU layer and a second adder, wherein the ReLU layer is used for carrying out Relu function operation on output data of the long-short term memory network, and the second adder is used for carrying out addition operation on input data of the long-short term memory network and output data of the ReLU and outputting an addition operation result as output data of the memory unit.

8. An image detection apparatus, characterized by comprising:

9. An image detection apparatus characterized by comprising: a memory and a processor;

the memory is used for storing programs;

the processor is configured to execute the program to implement the steps of the image detection method according to any one of claims 1 to 7.

10. A readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, performs the steps of the image detection method according to any one of claims 1 to 7.