CN111415323B - Image detection method and device and neural network training method and device - Google Patents

Image detection method and device and neural network training method and device Download PDF

Info

Publication number
CN111415323B
CN111415323B CN201910007257.2A CN201910007257A CN111415323B CN 111415323 B CN111415323 B CN 111415323B CN 201910007257 A CN201910007257 A CN 201910007257A CN 111415323 B CN111415323 B CN 111415323B
Authority
CN
China
Prior art keywords
image
block
images
training sample
inter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910007257.2A
Other languages
Chinese (zh)
Other versions
CN111415323A (en
Inventor
李斌
张浩鑫
罗瑚
刘永亮
黄继武
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201910007257.2A priority Critical patent/CN111415323B/en
Publication of CN111415323A publication Critical patent/CN111415323A/en
Application granted granted Critical
Publication of CN111415323B publication Critical patent/CN111415323B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses an image detection method and device, a neural network model training method and device, a computer storage medium and an electronic device; wherein the image detection method comprises: acquiring an image to be detected; obtaining at least two types of characteristic images based on the processing of the image to be detected; inputting the information of the at least two types of characteristic images into a neural network with at least two channels for recognition to obtain recognition and classification results; determining whether the image to be detected belongs to a double-compression image or not according to the identification and classification result; therefore, the convolutional neural network model has higher accuracy in image detection and stronger anti-evidence-obtaining resistance.

Description

Image detection method and device and neural network training method and device
Technical Field
The application relates to the field of machine learning, in particular to a method and a device for detecting an image. The application also relates to a training method and device of the neural network, a computer storage medium and an electronic device.
Background
Image compression techniques are widely used because they reduce a large amount of redundant information while maintaining a good visual effect of the image.
However, with the continuous development of the internet, the image as an effective storage and transmission medium of information brings convenience and has a great potential safety hazard. Because the tampered image can be compressed into a new image, the tampered content is not easily recognizable. Generally, an image undergoing tampering will generally undergo a double-compressed image, and therefore, double-compressed image detection is of great significance in image forensics because it can reveal whether a JPEG image in its storage format has been tampered with and possibly locate the tampered area.
Disclosure of Invention
The application provides an image detection method and device, which aim to solve the problem that in the prior art, a detection result is inaccurate. The present application additionally provides a neural network training method and apparatus, as well as a computer storage medium and an electronic device.
The application provides an image detection method, which comprises the following steps:
acquiring an image to be detected;
obtaining at least two types of characteristic images based on the processing of the image to be detected;
inputting the information of the at least two types of characteristic images into a neural network with at least two channels for recognition to obtain recognition and classification results;
and determining whether the image to be detected belongs to a double-compression image or not according to the identification and classification result.
In some embodiments, the obtaining at least two types of feature images based on the processing of the image to be detected includes:
and segmenting the image to be detected to obtain at least two types of characteristic images.
In some embodiments, the segmenting the image to be detected to obtain at least two types of feature images includes:
dividing the obtained pixel matrix of the image to be detected to obtain image blocks;
selecting pixels of a region adjacent to the central position of the image block and pixels of a region adjacent to the segmentation intersection position of the image block;
arranging and combining pixels in an area adjacent to the central position of the image block according to the block dividing sequence of the image to be detected to obtain an intra-block characteristic image;
partitioning the image into blocks, dividing pixels in the adjacent area of the intersection position, and arranging and combining according to the partitioning sequence of the image to be detected to obtain an inter-block characteristic image;
and determining the intra-block characteristic image and the inter-block characteristic image as at least two types of acquired characteristic images.
In some embodiments, the dividing the acquired image to be detected to obtain image blocks includes:
and dividing the image to be detected from left to right and from top to bottom to obtain image blocks.
In some embodiments, inputting information of the at least two types of feature images into a neural network having at least two paths for recognition to obtain recognition classification results, including:
inputting the intra-block feature image into a convolution group with an intra-block path in the neural network, determining an image feature of the intra-block feature image;
inputting the inter-block feature images into convolution groups with inter-block paths in the neural network, and determining image features of the inter-block feature images;
inputting the image features of the intra-block feature images and the image features of the inter-block feature images into a dimension reduction layer at the tail end of an intra-block access and a dimension reduction layer at the tail end of an inter-block access respectively, and determining the main image features of the intra-block feature images and the main image features of the inter-block feature images;
combining the main image features of the intra-block feature images and the main image features of the inter-block feature images to obtain feature vectors of the images to be detected;
and transmitting the characteristic vectors to a full connection layer of the neural network, and determining the identification and classification result of the image to be detected.
In some embodiments, further comprising:
before determining the image features of the intra-block feature images and the image features of the inter-block feature images, respectively performing filtering processing on the intra-block feature images and the inter-block feature images to obtain filtered intra-block feature images and inter-block feature images.
In some embodiments, said inputting said intra-block feature image into a convolution group with intra-block paths in said neural network, determining image features of said intra-block feature image, comprises:
inputting the intra-block feature image into a convolution group of an intra-block access in the neural network for convolution processing to obtain a processed intra-block feature image;
inputting the processed intra-block characteristic image into a pooling layer for processing to obtain the image characteristics of the intra-block characteristic image;
the inputting the inter-block feature image into a convolution group with an inter-block path in the neural network, and extracting the image feature of the inter-block feature image includes:
inputting the inter-block feature images into a convolution group of an inter-block path in the neural network for convolution processing to obtain processed inter-block feature images;
inputting the processed inter-block feature images into a pooling layer for processing to obtain image features of the inter-block feature images;
wherein the number of convolution groups of the intra-block path is at least two, and the pooling layer of the intra-block path is located between the convolution groups of the intra-block path; the number of the convolution groups of the inter-block paths is at least two, and the pooling layer of the inter-block paths is positioned between the convolution groups of the inter-block paths.
In some embodiments, inputting the image features of the intra-block feature image and the image features of the inter-block feature image to a dimension reduction layer at an end of an intra-block path and a dimension reduction layer at an end of an inter-block path, respectively, and determining the main image features of the intra-block feature image and the main image features of the inter-block feature image, comprises:
pooling image features of the intra-block feature images in the intra-block passage to obtain pooled intra-block feature images;
performing convolution processing on the pooled intra-block feature images to obtain main image features of the intra-block feature images;
performing convolution processing on the image features of the inter-block feature images in the inter-block path to obtain convolved inter-block feature images;
and performing pooling processing on the convolved inter-block feature images to obtain main image features of the inter-block feature images.
In some embodiments, merging the main image features of the intra-block feature image and the main image features of the inter-block feature image to obtain a feature vector of the image to be detected includes:
respectively converting the main image features of the intra-block feature images and the main image features of the inter-block feature images into one-dimensional feature vectors to obtain intra-block one-dimensional feature vectors and inter-block one-dimensional feature vectors;
and combining the intra-block one-dimensional characteristic vector and the inter-block one-dimensional characteristic vector according to the structural sequence of the corresponding intra-block path and inter-block path respectively to obtain the characteristic vector of the image to be detected.
In some embodiments, the determining whether the acquired image to be detected belongs to a dual compression image according to the recognition and classification result includes:
and comparing the single-compression classification probability value of the identification classification result of the image to be detected with the double-compression classification probability value, and if the double-compression classification probability value is greater than the single-compression classification probability value, determining the acquired image to be detected as a double-compression image.
In some embodiments, further comprising:
carrying out normalization pretreatment on the acquired image to be detected to obtain a normalized image to be detected;
based on the processing of the image to be detected, at least two types of characteristic images are obtained, including:
and segmenting the image to be detected after the normalization processing to obtain at least two types of characteristic images.
In some embodiments, the neural network having at least two paths is a convolutional neural network.
The present application also provides an image detection apparatus, including:
the acquisition unit is used for acquiring an image to be detected;
the processing unit is used for obtaining at least two types of characteristic images based on the processing of the image to be detected;
the classification and identification unit is used for inputting the information of the at least two types of characteristic images into a neural network with at least two paths for identification to obtain an identification and classification result;
and the determining unit is used for determining whether the image to be detected belongs to dual-compression image information or not according to the identification and classification result.
The application also provides a training method of the neural network model, which comprises the following steps:
acquiring a training sample image;
obtaining at least two types of training sample characteristic images based on the processing of the training sample images;
inputting the at least two types of training sample characteristic images into a neural network with at least two channels for recognition to obtain recognition and classification results;
determining the weight of the neural network according to the recognition and classification result;
and updating the weight of the neural network according to the determined weight of the neural network to obtain a trained neural network model.
In some embodiments, the obtaining at least two types of training sample feature image information based on the processing of the training sample images includes:
and segmenting the training sample image to obtain at least two types of characteristic images.
In some embodiments, the segmenting the training sample image to obtain at least two types of feature images includes:
dividing the pixel matrix of the obtained training sample image to obtain training sample image blocks;
selecting pixels of a region adjacent to the center position of the training sample image blocks and selecting pixels of a region adjacent to the segmentation intersection position of the training sample image blocks;
pixels of an area adjacent to the center of each block of the training sample image are arranged and combined according to the block dividing sequence of the training sample image to obtain a characteristic image of the training sample in the block;
segmenting pixels of adjacent areas at the intersection positions of the training sample images in a blocking manner, and arranging and combining the pixels according to the segmentation sequence of the training sample images in the blocking manner to obtain inter-block training sample characteristic images;
the intra-block training sample feature images and the inter-block training sample feature images are at least two types of training sample feature images obtained.
In some embodiments, the dividing the acquired training sample image to obtain training sample image blocks includes:
and dividing the training sample image from left to right and from top to bottom to obtain training sample image blocks.
In some embodiments, inputting information of the at least two types of training sample feature images into a neural network with at least two paths for recognition, and obtaining a recognition classification result, including:
inputting the intra-block training sample feature images into convolution groups with intra-block paths in the neural network, and determining image features of the intra-block training sample feature images;
inputting the inter-block training sample feature images into a convolution group with inter-block paths in the neural network, and determining image features of the inter-block training sample feature images;
respectively inputting the image features of the intra-block training sample feature images and the image features of the inter-block training sample feature images into a dimensionality reduction layer at the tail end of an intra-block access and a dimensionality reduction layer at the tail end of an inter-block access, and determining the main image features of the intra-block training sample feature images and the main image features of the inter-block training sample feature images;
combining the main image features of the intra-block training sample feature images and the main image features of the inter-block training sample feature images to obtain feature vectors of the training sample images;
and transmitting the feature vectors to a full connection layer of the neural network, and determining the identification classification result of the training sample image.
In some embodiments, further comprising:
before determining the image features of the intra-block training sample feature images and the image features of the inter-block training sample feature images, respectively performing filtering processing on the intra-block training sample feature images and the inter-block training sample feature images to obtain filtered intra-block training sample feature images and inter-block training sample feature images.
In some embodiments, the inputting the intra-block training sample feature image into a convolution group with an intra-block path in the neural network, determining the image features of the intra-block training sample feature image, comprises:
inputting the intra-block training sample characteristic image into a convolution group of an intra-block access in the neural network for convolution processing to obtain a processed intra-block training sample characteristic image;
inputting the processed intra-block training sample characteristic image into a pooling layer for processing to obtain the image characteristics of the intra-block training sample characteristic image;
the inputting the inter-block training sample feature images into a convolution group with inter-block paths in the neural network, and extracting the image features of the inter-block training sample feature images includes:
inputting the inter-block training sample characteristic image into a convolution group of an inter-block passage in the neural network for convolution processing to obtain a processed inter-block training sample characteristic image;
inputting the processed inter-block training sample characteristic images into a pooling layer for processing to obtain image characteristics of the inter-block training sample characteristic images;
wherein the number of convolution groups of the intra-block path is at least two, and the pooling layer of the intra-block path is located between the convolution groups of the intra-block path; the number of the convolution groups of the inter-block paths is at least two, and the pooling layer of the inter-block paths is positioned between the convolution groups of the inter-block paths.
In some embodiments, the inputting the image features of the intra-block training sample feature images and the image features of the inter-block training sample feature images into a dimension reduction layer at an end of an intra-block path and a dimension reduction layer at an end of an inter-block path, respectively, and the determining the main image features of the intra-block training sample feature images and the main image features of the inter-block training sample feature images include:
pooling image features of intra-block training sample feature images in the intra-block passage to obtain pooled intra-block training sample feature images;
performing convolution processing on the pooled intra-block training sample characteristic images to obtain main image characteristics of the intra-block training sample characteristic images;
performing convolution processing on the image features of the inter-block training sample feature images in the inter-block passage to obtain the convolved inter-block training sample feature images;
and performing pooling treatment on the convolved inter-block training sample characteristic images to obtain main image characteristics of the inter-block training sample characteristic images.
In some embodiments, the merging the main image features of the intra-block training sample feature image and the main image features of the inter-block training sample feature image to obtain the feature vector of the training sample image includes:
respectively converting the main image features of the intra-block training sample feature images and the main image features of the inter-block training sample feature images into one-dimensional feature vectors to obtain intra-block one-dimensional feature vectors and inter-block one-dimensional feature vectors;
and combining the intra-block one-dimensional characteristic vector and the inter-block one-dimensional characteristic vector according to the structural sequence of the corresponding intra-block path and inter-block path respectively to obtain the characteristic vector of the training sample image.
In some embodiments, further comprising:
carrying out normalization pretreatment on the obtained training sample image to obtain a training sample image after normalization treatment;
the obtaining at least two types of training sample feature images based on the processing of the training sample images comprises:
and segmenting the training sample image after the normalization processing to obtain at least two types of training sample characteristic images.
In some embodiments, the determining the weight of the neural network according to the classification result includes:
calculating according to the classification result as a classification label value and a real label value of the training sample image to obtain a loss value;
determining a loss value as a weight of the neural network.
In some embodiments, the updating the weights of the neural network according to the determined weights of the neural network to obtain the trained neural network model includes:
and updating the old weight of the neural network by taking the determined weight of the neural network as a new weight in a direction propagation mode to obtain the trained neural network model with the double channels.
The present application further provides a training apparatus for a neural network model, including:
an acquisition unit for acquiring a training sample image;
the processing unit is used for obtaining at least two types of training sample characteristic images based on the processing of the training sample images;
the recognition unit is used for inputting the information of the characteristic images of the at least two types of training samples into a neural network with at least two paths for recognition to obtain recognition and classification results;
the determining unit is used for determining the weight of the neural network according to the recognition and classification result;
and the updating unit is used for updating the weight of the neural network according to the determined weight of the neural network to obtain the trained neural network model.
The application also provides a computer storage medium for storing the data generated by the network platform and a program for processing the data generated by the network platform;
when read and executed by the processor, the program performs the following operations:
acquiring an image to be detected;
obtaining at least two types of characteristic images based on the processing of the image to be detected;
inputting the information of the at least two types of characteristic images into a neural network with at least two channels for recognition to obtain recognition and classification results;
and determining whether the image to be detected belongs to a double-compression image or not according to the identification and classification result.
The present application further provides an electronic device, comprising:
a processor;
a memory for storing a program for processing network platform generated data, the program when read and executed by the processor performing the following operations:
acquiring an image to be detected;
obtaining at least two types of characteristic images based on the processing of the image to be detected;
inputting the information of the at least two types of characteristic images into a neural network with at least two channels for recognition to obtain recognition and classification results;
and determining whether the image to be detected belongs to a double-compression image or not according to the identification and classification result.
Compared with the prior art, the method has the following advantages:
the method comprises the steps of obtaining at least two types of characteristic images based on processing of an image to be detected, inputting the two types of characteristic images into a neural network with at least two channels respectively for recognition, obtaining recognition and classification results, and determining whether the image to be detected belongs to a double-compression image or not according to the recognition and classification results, so that the pixel characteristics of the two types of characteristic images can be mined, the recognition of the features of JPEG double compression is facilitated, and attack on the convolutional neural network due to hiding or removing of compression traces of the double-compression image by anti-forensics is better resisted.
In addition, at least two types of characteristic images obtained after segmentation are subjected to filtering processing, so that the identification accuracy and the anti-evidence-obtaining capability of the double-compression image are further improved.
As described above, the present application provides a training method for a neural network model, in which at least two types of training sample feature images are obtained based on processing of an obtained training sample image, then the at least two types of training sample feature images are input into a neural network having at least two channels to be identified, an identification classification result is obtained, a weight of the convolutional neural network is determined according to the identification classification result, the weight of the convolutional neural network is updated (i.e., an iterative training process) according to the determined weight of the convolutional neural network, and a trained convolutional neural network model is obtained, so that the convolutional neural network model has high accuracy and strong anti-forensics resistance when detecting an image.
Drawings
FIG. 1 is a schematic diagram of a neural network of the prior art three;
FIG. 2 is a schematic diagram of the structure of a spatial neural network involved in a prior art four-neural network; .
FIG. 3 is a schematic diagram of a convolutional neural network structure after a frequency domain neural network and a spatial domain neural network are combined in a neural network in the fourth prior art;
FIG. 4 is a flowchart of an embodiment of an image detection method provided by the present application;
fig. 5 is a schematic structural diagram of an embodiment of segmenting an image to be detected in an image detection method provided by the present application;
fig. 6 is a schematic view of a visual effect obtained after an image to be detected is segmented in an image detection method provided by the present application;
FIG. 7 is a schematic structural diagram of a convolutional neural network for detecting an image in an image detection method provided in the present application;
FIG. 8 is a schematic structural diagram of a convolution block in a convolution neural network for detecting an image in an image detection method provided by the present application;
fig. 9 is a schematic diagram illustrating a convolution operation performed on image feature data input into a convolution block in an image detection method provided in the present application;
FIG. 10 is a schematic structural diagram of an embodiment of an image detection apparatus provided in the present application;
FIG. 11 is a flow chart of an embodiment of a method for training a convolutional neural model provided herein;
FIG. 12 is a schematic diagram of an embodiment of a convolutional neural network model training apparatus provided in the present application;
FIG. 13 is a diagram of a common scenario for embedding anti-forensic techniques in a dual JPEG compression process;
fig. 14 is a schematic view of the detection effect of fig. 1 applied to an image processed in an anti-forensic manner.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit of this application and is therefore not limited to the specific implementations disclosed below.
The terminology used in the description herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. The description used in this application and in the appended claims is for example: "a," "an," "first," and "second," etc., are not necessarily limited to a quantity or order, but rather are used to distinguish one type of information from another.
The detection of the double JPEG compressed image can be carried out by the following methods:
1. double JPEG image compression detection method based on initial character
In the prior art, a data rule appearing in the nature is provided by the ford's Law, that is, in a number generated in the nature, the probability of the appearance of the initial (1-9) of the number is reduced along with the increase of the numerical value of the initial itself, so that the detection of the dual JPEG image compression is realized, as shown in the following formula.
Figure BDA0001935931970000101
The DCT coefficients of a JPEG image that is compressed once are subject to Benford's Law, while the DCT coefficients of an image that is compressed twice are not subject to the Benford's Law. Based on this finding, a simple and efficient initial character based (MBFDF) extraction method is proposed to detect double JPEG compression.
The method comprises the following three steps:
1) first, the first 20 AC coefficient sub-bands are selected in the zig-zag order. The sub-band (sub-band) refers to a one-dimensional vector formed by connecting numerical values with the same position code in all blocks of the image in series;
2) then, for each sub-band, extracting initial character features, namely calculating the frequency of the numbers 1-9 as the appearance of the initial character in the sub-band, wherein each sub-band has 9-dimensional sub-features;
3) and finally, combining the 20 sub-features into a 180-dimensional feature vector, and sending the 180-dimensional feature vector to an FLD (Fisher Linear discriminant) classifier for training and testing to obtain a detection result of the double JPEG compressed image.
2. Dual JPEG image compression detection method based on statistical histogram
The method uses a histogram statistical method to detect double JPEG compression detection, is simple and effective, and mainly comprises the following three steps:
1) firstly, extracting the first 9 subbands in a zig-zag sequence, and counting the number of times of occurrence of values 0-15 in each subband, namely x ═ { hi, j (0), hi, j (1), … hi, j (15) }, (i, j) represents a spatial domain coordinate index;
2) then, normalizing the feature x, wherein the frequencies of the statistics values 0-15 appearing in all the 16 values are respectively, namely x' { hi, j (0), hi, j (1), … hi, j (15) }/Ci, j, Ci, j is the total frequency of the 16 values;
3) finally, x' in each sub-band is combined to obtain a 144-dimensional feature vector, and the 144-dimensional feature vector is sent to a support vector machine (SVM: and (4) training and testing in a Support Vector Machine to obtain a detection result of the double JPEG compressed image.
3. Double JPEG image compression detection method based on frequency domain convolutional neural network
The method takes a DCT coefficient histogram of a JPEG image as input, establishes a frequency domain-based one-dimensional convolutional neural network, and the structure of the network is shown in figure 1. The convolutional neural network of the method is mainly divided into three parts which are respectively input to a convolutional module and a full connection layer, and the specific detection process is as follows:
(1) inputting: for each sub-band of DCT AC coefficients, a histogram in the range of values [ -5, 5] is extracted, and the sub-features generated by each sub-band have 11 dimensions. Taking the sub-features of the first 9 alternating current coefficient sub-bands in the zig-zag sequence to obtain the input of the whole network, namely a one-dimensional feature vector with the dimension of 99 multiplied by 1;
(2) a convolution module: the network has two convolution modules, each convolution module consisting of a one-dimensional convolution layer and a one-dimensional maximum pooling layer, for generating 100 feature maps. Wherein, the size of the convolution kernel is 3 multiplied by 1, and the convolution step length is 1; the size of the pooling kernel is 3 × 1 and the convolution step size is 2. Furthermore, the network uses the ReLU activation function in every layer of connection and does not use any regularization operation;
(3) full connection layer: the network has three fully connected layers, with the number of neurons being 1000, 1000 and 2, respectively. At the second fully-connected layer, a Softmax function (normalized index function, see equation (1.2) below) is used to calculate the probability for each class. Assuming that the total number of classes of outputs is C and the output of the mth neuron is am, the probability of each class is:
Figure RE-GDA0002010455210000111
4. double JPEG image compression detection method based on convolution neural network of combined domain
The method provides a combined domain convolution neural network based on a frequency domain and a space domain respectively. For a frequency-domain neural network (frequency-domain CNN), a one-dimensional convolutional neural network is adopted, and the difference from the conventional one-dimensional convolutional neural network is as follows: first, the input histogram value range is expanded from [ -5, 5] to [ -50, 50], in other words, the input dimension of the frequency domain network becomes 909 × 1; second, a ReLU activation function is added at the full connection level and a random Drop (Drop Out) approach is used. As shown in fig. 2, the spatial-domain CNN is also mainly divided into three parts, namely an input part, a convolution module and a full-link layer:
(1) inputting: the neural network processing object of the spatial domain part is a three-channel color image with the size of 64 × 64. Firstly, decompressing a JPEG image to a space domain, and then taking a pixel value matrix (normalized to a range of 0-1) of a full image as input, so that the dimension of input data is 64 multiplied by 3;
(2) and a convolution module: the number of the convolution modules is 4, each convolution module consists of a two-dimensional convolution layer and a ReLU activation function, the size of a convolution kernel is 3 multiplied by 3, and the number of output feature maps is 32, 64 and 64 respectively. Wherein, in the second and the fourth convolution modules, a maximum value pooling layer with a pooling kernel size of 2 × 2 and a ReLU activation function are also used.
(3) Full connection layer: the network has two fully connected layers, the number of neurons being 256 and 9 respectively. The first fully-connected layer uses the ReLU activation function and random Drop (Drop Out) approach to make the learning features more robust, and the second fully-connected layer uses the Softmax function for classification as well.
Finally, in the frequency domain and the spatial domain networks, the output of the first fully-connected layer is taken as a merging feature to form a convolutional neural network (multi-domain CNN) based on the combined domain, as shown in fig. 3.
Detection tests show that in the double JPEG image compression detection method based on the convolution neural network of the combined domain, the performance of the three neural networks is ordered as follows: spatial domain network (SD-CNN) < frequency domain network (FD-CNN) < combined domain network (MD-CNN).
The four detection methods are all based on the frequency domain DCT coefficient histogram feature to carry out the detection of double JPEG compression, and the detection is successful. However, when an attack is performed on the frequency domain DCT coefficient histogram feature by introducing an anti-forensics technique, the accuracy of detection by the above four detection methods is caused.
Based on the above content, the image detection method provided by the application uses the neural network as a detection carrier, and monitors the obtained image to be detected to know whether the image to be detected is a doubly compressed image.
Referring to fig. 4, fig. 4 is a flowchart illustrating an embodiment of an image detection method provided in the present application, where the method includes:
step S401: and acquiring an image to be detected.
In step S401, the image to be detected may be an image that needs to be identified as being double-compressed, and the image may be a static JPEG image stored separately, a JPEG image in JPEG format stored in a dynamic image, or a JPEG image obtained by capturing data information such as a dynamic video or a dynamic image.
JPEG is an abbreviation of Joint Photographic Experts Group (Joint Photographic Experts Group), and the post-file dropping name is ". jpg" or ". JPEG", which is the most commonly used image file format, is established by a software development and association organization, and is a lossy compression format capable of compressing images in a small storage space.
In the present embodiment, based on the consideration of the detection target, that is: whether the image to be detected is a double-compression JPEG image or a single-compression JPEG image is detected, therefore, the image to be detected acquired in step S101 may be an image subjected to compression processing, and of course, the acquired image to be detected is intended to detect whether the image to be detected is a double-compression image, and actually, the range of the acquired image to be detected is not limited to the double-compression JPEG image or the single-compression JPEG image.
Step S402: and acquiring at least two types of characteristic images based on the processing of the image to be detected.
In this embodiment, the specific implementation process of obtaining at least two types of feature images based on the processing of the image to be detected may be to segment the image to be detected to obtain at least two types of feature images. In other words, when at least two types of feature images are obtained, the processing on the image to be detected may include segmentation processing on the image to be detected, and in other embodiments, the processing may further include: the image content may include image elements such as image color and image contour according to the extraction of the image content in the image to be detected.
In this embodiment, the division manner is described as the manner of image processing, so a specific implementation process of the step S402 may include the following steps:
step S402-a: and dividing the acquired pixel matrix of the image to be detected to obtain image blocks.
Referring to fig. 5, fig. 5 is a schematic structural diagram illustrating an embodiment of segmenting an image to be detected in an image detection method provided by the present application.
In general, the JPEG image compression and decompression processes are each performed by 8 × 8 blocking, and therefore, in the step S402-a, the image to be detected may be divided in an 8 × 8 form, but other forms of division are not excluded, and this 8 × 8 division is exemplified. In this embodiment, the dividing manner may be that the image to be detected is divided from left to right and from top to bottom.
If the pixel matrix of the image to be detected is 32 × 32, the blocks divided in the form of 8 × 8 are divided as shown in fig. 5 by the thick black solid line, that is, are divided into 16 blocks in total, and the 16 blocks are image blocks. It should be noted that, the image to be detected with the pixel matrix of 32 × 32 is used for description here, and for convenience of clear understanding, the technical solution of the present application is not limited to the size of the image to be detected, and the pixel matrix of the image to be detected may also be 256 × 256 or other.
Step S402-b: and selecting pixels of the area adjacent to the central position of the image block and pixels of the area adjacent to the segmentation intersection position of the image block.
As shown in fig. 5, for the 8 × 8 block example adopted in step S402-a, selecting pixels in the area near the center of the image block is to select 4 × 4 pixels with the center of the 8 × 8 block as the center point, as shown by the diagonal square in fig. 5; and selecting pixels in the area adjacent to the image block segmentation intersection position, wherein the pixels adjacent to 4 × 4 are selected by taking the block segmentation intersection position as a central point, as shown by a dotted square in fig. 5.
Step S402-c: and arranging and combining pixels in a region adjacent to the central position of the image Block according to the Block dividing sequence of the image to be detected to obtain an Intra-Block (Intra-Block) characteristic image.
The step S402-c is to arrange and combine the diagonal squares as shown in fig. 5 in the dividing order of the blocks to form a new two-dimensional pixel matrix, and determine the new two-dimensional pixel matrix as the intra-block feature image. As can be understood by taking the blocking example in step S402-a, the two-dimensional pixel matrix is determined as a feature image within a block with an image size of 16 × 16.
Step S402-d: and partitioning the image into blocks, dividing pixels in the adjacent area of the intersection position, and arranging and combining according to the partitioning sequence of the image to be detected to obtain an Inter-Block characteristic image.
The step S402-c is to arrange and combine the dot-shaped grids shown in fig. 5 according to the dividing sequence of the blocks to form a new two-dimensional pixel matrix, and determine the new two-dimensional pixel matrix as the inter-block feature image. As can be understood by taking the blocking in step S402-a as an example, the image size of the two-dimensional pixel matrix determined as the inter-block feature image is 12 × 12.
As can be seen from the steps S402-c and S402-d, when the image to be detected is divided, the number of pixels in the area adjacent to the dividing intersection position of the image blocks is less than the number of pixels in the area adjacent to the central position of the image blocks, and therefore, the size of the intra-block feature image obtained after arrangement and combination is larger than the size of the inter-block feature image.
In order to better and intuitively understand that the pixels in the area adjacent to the center position of the image blocks in the step S402-c are arranged and combined according to the block division sequence of the image to be detected to obtain the intra-block feature image, and the pixels in the area adjacent to the intersection position of the image blocks in the step S402-d are arranged and combined according to the block division sequence of the image to be detected to obtain the inter-block feature image, please refer to fig. 6, which shows a schematic view of the visual effect after the image to be detected is divided in the image detection method provided by the present application, wherein the middle part is an original view of the image to be detected, and the size is 256 × 256; the intra feature image is 128 × 128 on the left and the inter feature image is 124 × 124 on the right.
Step S402-e: and determining the intra-block characteristic image and the inter-block characteristic image as at least two types of acquired characteristic images.
Based on the above-mentioned division result of the image to be detected, the intra-block feature image and the inter-block feature image obtained after the division need to be input to the convolutional neural network for identification, and therefore, the process proceeds to step S403.
Step S403: inputting the information of the at least two types of characteristic images into a neural network with at least two channels for recognition to obtain recognition and classification results;
first, it should be noted that a neural network (neural network) is an operational model, and is formed by connecting a large number of nodes (or neurons) to each other. Each node represents a particular output function called an activation function.
The neural network includes: bp (back propagation) neural network, Radial Basis Function (RBF-Basis Function) neural network, perceptron neural network, linear neural network, self-organizing neural network, feedback neural network, convolutional neural network (CNN-convolutional neural network), and the like. In the present embodiment, the description of the image detection method is mainly given by taking a convolutional neural network as an example.
The specific implementation process of step S403 may include five steps, which are described below in sequence:
step S403-a: inputting the intra-block feature image into a convolution group with an intra-block path in the convolutional neural network, determining an image feature of the intra-block feature image;
step S403-b: inputting the inter-block feature image into a convolution group with an inter-block path in the convolutional neural network, and determining an image feature of the inter-block feature image.
The above-mentioned steps S403-a and S403-b are respectively explained for the convolution of intra-block paths and the convolution of inter-block paths. In the present embodiment, as shown in fig. 7, fig. 7 is a schematic structural diagram of a convolutional neural network for detecting an image in an image detection method provided by the present application; a convolutional neural network for detecting images includes intra-block paths and inter-block paths, both of which include: a convolution layer, a dimensionality reduction layer, a merging layer and a full connection layer. Based on the implementation procedure of the above steps, first, the structure of the convolutional layer in the convolutional neural network for detecting an image will be described.
The convolutional neural network mentioned in the present embodiment is used for detecting an image based on a spatial domain, and has at least two paths. The spatial domain refers to a space composed of image pixels, that is: the data information of at least two types of feature images obtained in step S401 and step S402 are both pixels of the feature image. The two passes are convolution operations (or detection operations) corresponding to intra-block feature images and inter-block feature images, respectively, and thus the two passes may be referred to as intra-block passes, which are convolved corresponding to intra-block feature images, and inter-block passes, which are convolved corresponding to inter-block feature images.
Based on the example provided in step S402-c (fig. 6), the intra-block feature map size input to the intra-block path is 128 × 128 × 1, and the inter-block feature map size input to the inter-block path is 124 × 124 × 1, where the convolutional layer may include a plurality of convolutional groups, in this embodiment, each of the intra-block path and the inter-block path includes four convolutional groups, each convolutional group includes at least two convolutional blocks with a size of 3 × 3, and the number of output feature images is 32, because 32 convolutional cores are exemplified in this embodiment, each convolutional core outputs one feature image, and the number of feature images that are output in total is 32.
It should be noted that the number of convolution groups is not limited to the number provided in the present embodiment, and the number may be determined according to the size of the actual image to be detected. Similarly, the number of convolution blocks in each convolution group can also be determined by the actual size of the image to be detected.
Referring to fig. 8 in conjunction with fig. 7, fig. 8 is a schematic structural diagram of convolution blocks in a convolutional neural network for detecting an image in an image detection method provided in the present application, where each convolution block includes a sub-convolution layer, a batch normalization layer, and an activation function layer. In order to reduce the computational complexity and find the best balance between the memory efficiency and the memory capacity, the convolutional neural network divides the data of the intra-block characteristic image input to the intra-block path and the data of the inter-block characteristic image input to the inter-block path into small batches (mini-batch) or so-called mini-batches, determines the number of detected intra-block characteristic images and the data of the inter-block characteristic images, and divides the data of the input characteristic images into several batches for detection when the data of the input intra-block characteristic images and the data of the input inter-block characteristic images cannot be detected through the neural network at one time (namely, the data of the input characteristic images is large).
It is understood that when the data of the input feature image is small, the whole batch may be used.
The method comprises the steps of performing convolution on data of a small-batch processed characteristic image in a convolution block, and then performing batch normalization on the characteristic image obtained after the convolution processing, wherein the batch normalization processing aims at: the data of the intra-block characteristic images after small batch division processing of the intra-block characteristic images and the data of the originally input intra-block characteristic images keep the same distribution, and similarly, the data of the inter-block characteristic images after small batch division processing of the inter-block characteristic images and the data of the originally input inter-block characteristic images keep the same distribution, and the batch normalization processing process is that a proportion parameter gamma and an offset parameter beta in a learning formula set (5.1):
Figure BDA0001935931970000161
in the above formula, xiAnd
Figure BDA0001935931970000162
is the original value and the value after batch normalization processing of small batch input data, mu and sigma2Mean and variance, y, of a small batch of data, respectivelyiThe output value at the batch normalization processing level is for the small batch. In addition, e is a preset small constant to prevent the denominator from being zero. In the detection process, assuming that the input data is divided into R small batches, the mean and variance of the data of the R-th small batch are updated according to the following formula group:
Figure BDA0001935931970000163
where ρ is a momentum to keep the updates of adjacent batches balanced.
In this embodiment, the sub-convolutional layer operation of the convolutional block in each convolutional group uses a "zero-padding convolution" approach, i.e., the output feature image and the input data keep the same size. The batch normalization operation uses equations (5.1) and (5.2) for the update operation, namely: the original activation x corresponding to a certain neuron is converted by subtracting the mean value obtained by m activation x obtained by m instances in the mini-Batch and dividing the mean value by the obtained square difference, so that the characteristic image can be ensured to keep the same pixel distribution characteristic point of the original image to be detected.
And obtaining effective characteristics of the characteristic images by activating the function layer for the characteristic images subjected to batch normalization processing in the rolling blocks or directly for the characteristic images after convolution. In some embodiments, when given input data I and a convolution kernel K, a Feature Map (Feature Map) F is generated using a two-dimensional convolution operation, as shown in equation (5.3):
Figure BDA0001935931970000164
wherein the content of the first and second substances,
Figure BDA0001935931970000165
represented is a convolution operation in information science, W and H represent the width and height of the size of a convolution kernel K (input image feature data), respectively, m and n represent values of F (m, n) currently calculated, u and v represent serial numbers in the summation process, bcRepresenting the bias in the convolution operation.
Referring to fig. 9, fig. 9 is a schematic diagram illustrating a convolution operation performed on image feature data input into a convolution block.
Where the convolution kernel K is 3 × 3, i.e.:
Figure BDA0001935931970000171
then W ═ 3, H ═ 3;
inputting image characteristic data:
Figure BDA0001935931970000172
Figure BDA0001935931970000173
the process comprises the following steps:
turning the convolution kernel K by 180 degrees to obtain K'; aligning the center of the turned K' with I (2, 2), multiplying corresponding elements and adding, wherein W is 3 and H is 3; u-2, v-2; m and n range from 0 to 2. Then F ═ 4 (2, 2) ═ 1 × 1+0 × 1+1 × 1+0 × 0+1 × 1+0 × 1+1 × 0+0 × 0+1 × 1; and (3) performing convolution and smooth movement, wherein W and H are unchanged, m and n are unchanged in value range, u is 2, and v is 3. The calculation of F (2, 2) is repeated for the elements of I in sequence by using a convolution kernel K' to obtain F (2, 3), and then the translation sliding convolution kernel is repeated and the convolution is carried out to obtain the final F value.
In order to learn deep-level image features more efficiently in intricate data, therefore, a nonlinear mapping function, i.e., an activation function, is employed. The use of the activation function adds a nonlinear factor, improves the expression capability of the neural network on the image detection method, and solves the problem which cannot be solved by a linear model. In the neural network, commonly used activation functions are Sigmoid, TanH, ReLU, leak ReLU, ELU, and the like. According to different input numbersAccording to the characteristics, TanH and ReLU are respectively adopted as activation functions to better extract sparse data xspTo learn valid or otherwise critical image features, the TanH and ReLU activation functions may use the following equations (5.4) and (5.5), respectively:
Figure BDA0001935931970000174
Figure BDA0001935931970000175
in the above formula (5.5), when x issp>0, the parameter describing the image feature participates more in the gradient update when xspWhen the output of the neuron is less than or equal to 0 (other neurons), the output of part of the neurons is zero or less than zero, so that the sparsity of the network can be obtained, the interdependence relation among the parameters of the image characteristics is reduced, and overfitting is avoided.
As shown in fig. 8, in this embodiment, the intra-block path and the inter-block path may respectively include four convolution groups, and since the convolution groups do not have dimension reduction capability, the convolution groups are connected by a pooling layer, so as to reduce the size (or pixel value) of the output feature image.
Structure of intra-block via:
inputting the obtained characteristic image with the size of 124 multiplied by 32 blocks into a first pooling layer after convolution processing of a first convolution group in intra-block channel-freeing; after the first pooling layer processes the intra-block feature images of 124 × 124 × 32, inputting the obtained intra-block feature images of 62 × 62 × 32 in size into a second convolution group; the second convolution group performs convolution processing on the characteristic image in the input 62 × 62 × 32 block, and inputs the obtained characteristic image in the 58 × 58 × 32 block to the second pooling layer; the second pooling layer performs pooling processing on the characteristic images in the input 58 × 58 × 32 blocks, and inputs the obtained characteristic images in the blocks with the size of 29 × 29 × 32 to the third convolution group; the third convolution group performs convolution processing on the input characteristic image in the 29 × 29 × 32 block, and inputs the obtained characteristic image in the block with the size of 25 × 25 × 32 to the third pooling layer; after the third pooling layer performs pooling processing on the input 25 × 25 × 32 intra-block feature images, inputting the obtained intra-block feature images with the size of 13 × 13 × 32 to a fourth convolution group; after performing convolution processing on the input 13 × 13 × 32 intra-block feature images by the fourth convolution group, inputting the obtained 9 × 9 × 32 intra-block feature images into a fourth pooling layer, and pooling the input 9 × 9 × 32 intra-block feature images by the fourth pooling layer to obtain 5 × 5 × 32 intra-block feature images; for further reduction, the obtained feature image in the 5 × 5 × 32 block may be input to the end convolution layer after the fourth pooling layer and subjected to convolution processing to obtain a feature image in the 4 × 4 × 32 block.
Structure of inter-block path:
after the convolution processing of a first convolution group in the inter-block network, inputting the obtained inter-block feature image with the size of 122 multiplied by 32 into a first pooling layer; after the first pooling layer processes the 122 × 122 × 32 inter-block feature images, inputting the obtained inter-block feature images with the size of 60 × 60 × 32 into a second convolution group; the second convolution group performs convolution processing on the input 60 × 60 × 32 inter-block feature image, and inputs the obtained 56 × 56 × 32 inter-block feature image to the second pooling layer; after the second pooling layer performs pooling processing on the input 56 × 56 × 32 inter-block feature images, inputting the obtained inter-block feature images with the size of 28 × 28 × 32 into a third convolution group; after the third convolution group performs convolution processing on the input 28 × 28 × 32 inter-block feature images, inputting the obtained inter-block feature images with the size of 24 × 24 × 32 to the third pooling layer; after the third pooling layer performs pooling processing on the input 24 × 24 × 32 inter-block feature images, inputting the obtained inter-block feature images with the size of 12 × 12 × 32 into a fourth convolution group; the fourth convolution group performs convolution processing on the input feature images between 12 × 12 × 32 blocks, and then inputs the obtained feature images between 8 × 8 × 32 blocks to the end convolution layer, and the end convolution layer performs convolution on the input feature images between 8 × 8 × 32 blocks to obtain feature images between 7 × 7 × 32 blocks; in order to make the feature image size output by the inter-block path and the intra-block path the same, a dimension reduction operation may be further employed, that is, a 7 × 7 × 32 inter-block feature image may be input to a downsampling layer located after the end convolution layer and processed to obtain a 4 × 4 × 32 inter-block feature image.
As can be understood from the above description of the convolutional layer having a convolutional neural network structure with at least two paths, the intra-block path and the inter-block path are respectively subjected to convolution processing by corresponding convolution groups, and the image features of the intra-block feature image and the image features of the inter-block feature image are output. Then, after obtaining the image features of the intra-block feature image and the image features of the inter-block feature image, it is necessary to perform a dimensionality reduction operation on the image features of the intra-block feature image and the image features of the inter-block feature image, respectively, so as to extract the main image features of the intra-block feature image and the main features of the inter-block feature image and avoid over-fitting, and therefore, step S403-c is entered.
Step S403-c: and inputting the image features of the intra-block feature images and the image features of the inter-block feature images into a dimension reduction layer at the tail end of an intra-block access and a dimension reduction layer at the tail end of an inter-block access respectively, and determining the main image features of the intra-block feature images and the main image features of the inter-block feature images.
The specific implementation of this step is also described with reference to the dimension reduction layer structure in the convolutional neural network in fig. 7.
The structure of the dimensionality reduction layer at the end of a via within a block is explained:
the dimensionality reduction layer of the intra-block via end comprises: the dimension reduction pooling layer and the dimension reduction convolution layer are arranged at the tail end of the convolution layer, namely the tail part of the fourth convolution group; the specific dimension reduction process comprises the following steps:
pooling image features of intra-block feature images output by the convolution layers in the intra-block passage to obtain pooled intra-block feature images;
performing convolution processing on the pooled intra-block feature images to obtain main image features of the intra-block feature images;
in this embodiment, the maximum pooling is selected for the operation of the pooling layer, because the spatial pixel values are normalized to [0, 1], and the difference between the pixel values at adjacent positions is not large, so the maximum pooling is more reasonable. The pooling process may include:
respectively determining the sizes of the pooling windows of the corresponding in-block pooling layers;
determining the pooling window characteristic value of a pooling layer in a block according to the maximum value of the image characteristic value covered by the size of the pooling window;
and combining all the pooling window characterization values of the pooling layer in the block to obtain the image characteristics of the feature image in the pooled block.
In this embodiment, the pooling window size of the pooling layer is 3 × 3, and the step size is 2.
And (3) after obtaining the image characteristics of the pooled intra-block feature images, further performing convolution processing, namely inputting the image characteristics of the pooled intra-block feature images into an end convolution sublayer in the dimensionality reduction layer for convolution processing, wherein the end convolution sublayer performs convolution on the image characteristics of the pooled intra-block feature images by adopting a 2 x 2 convolution kernel to obtain the main image characteristics of the intra-block feature images.
Following the example in step S403, the intra-block feature image size after the maximum pooling layer processing is 5 × 5 × 32, and the main image feature size of the feature image in the convolved intra block is 4 × 4 × 32.
The structure of the dimensionality reduction layer at the end of the inter-block path is explained:
the dimensionality reduction layer at the end of the inter-block path comprises: the dimension reduction convolutional layer and the dimension reduction pooling layer are arranged at the tail end of the convolutional layer, namely the tail part of the fourth convolutional group; the dimension reduction layer at the end of the inter-block path is different from the dimension reduction layer at the end of the intra-block path in that: the order of the dimensionality reduction convolution layer and the dimensionality reduction pooling layer is different; the dimension reduction convolutional layer in the dimension reduction layer at the end of the inter-block path is positioned before the dimension reduction pooling layer, and the dimension reduction layer at the end of the intra-block path is opposite to the dimension reduction pooling layer, and specifically, the dimension reduction process of the dimension reduction layer at the end of the inter-block path comprises the following steps:
performing convolution processing on the image characteristics of the inter-block characteristic images in the inter-block passage to obtain convolved inter-block characteristic images;
and performing pooling processing on the convolved inter-block feature images to obtain main image features of the inter-block feature images.
The size of the convolution kernel adopted in the convolution processing is the same as that of the convolution kernel of the dimensionality reduction layer at the tail end of the path in the block.
Following the example in step S403 described above, the main image feature size of the inter-block feature image after the maximum pooling layer processing is 4 × 4 × 32, with the size of the inter-block feature image after convolution being 7 × 7 × 32. It should be noted that, both the convolution sublayer in the dimension reduction layer at the end of the intra-block path and the convolution sublayer in the dimension reduction layer at the end of the inter-block path are effective convolution modes, that is, zero-filling convolution is not adopted (zero-filling operation at the edge of the feature map is not performed), and the dimension reduction layer at the end of the intra-block path and the dimension reduction layer at the end of the inter-block path enable the main image features of the intra-block feature image and the main image features of the inter-block feature image output by the intra-block path and the inter-block path to maintain the same number and the same size of feature images.
In this embodiment, effective convolution is used for both the dimensionality reduction convolution layers in the inter-block path and the intra-block path, that is: zero padding operations are not employed (zero padding operations are not performed at feature image edges).
Based on the above, the main image features in the two channels are obtained, and then the main image features in the two channels need to be merged to obtain a vector of one-dimensional feature images, so that the method proceeds to step S403-d, which is embodied in the merged layer in the convolutional neural network structure of fig. 7.
Step S403-d: combining the main image features of the intra-block feature images and the main image features of the inter-block feature images to obtain the feature vectors of the images to be detected;
in the specific implementation of step S403-d, the main image features with the same number of pixels (32) can be obtained through step S403-c, and the main image features obtained in the intra-block path and the main image features obtained in the inter-block path have the same size (4 × 4).
Firstly, respectively converting the main image features of the intra-block feature images and the main image features of the inter-block feature images into one-dimensional feature vectors to obtain intra-block one-dimensional feature vectors and inter-block one-dimensional feature vectors;
and then, combining the intra-block one-dimensional characteristic vector and the inter-block one-dimensional characteristic vector according to the structure sequence of the corresponding intra-block passage and inter-block passage respectively to obtain the characteristic vector of the image to be detected.
According to the use example, the two-dimensional feature vectors of the main image features in the two paths are 4 multiplied by 32, the two-dimensional feature vectors are converted into one-dimensional feature vectors of 512 multiplied by 1, and then the one-dimensional feature vectors are sequentially combined according to the order of the path structures, so that the feature vectors with the dimensionality of 1024 multiplied by 1 are obtained; the path structure may be that a one-dimensional feature vector of a main image feature in an intra-block path is obtained first, then a one-dimensional feature vector of a main image feature in an inter-block path is obtained, and then the two are combined, so that the first 512 × 1 of the combined 1204 × 1 feature vector is a one-dimensional feature vector of a main image feature of the intra-block path, and the second 512 × 1 is a one-dimensional feature vector of a main image feature of the inter-block path; of course, the one-dimensional feature vector of the main image feature in the inter-block path may be obtained first, and then the one-dimensional feature vector of the main image feature in the intra-block path may be obtained, and correspondingly, the first 512 × 1 of the combined 1204 × 1 feature vector is the one-dimensional feature vector of the main image feature in the inter-block path, and the second 512 × 1 is the one-dimensional feature vector of the main image feature in the intra-block path.
Step S403-e: and transmitting the characteristic vectors to a full connection layer of the convolutional neural network, and determining the identification and classification result of the image to be detected.
The specific implementation process of steps S403-e can be described with reference to fig. 7 regarding the structure of the fully-connected layer in the convolutional neural network.
The full-link layer in step S403-e aims to perform recognition and classification according to the one-dimensional feature vector of the input image feature, that is, the combined one-dimensional feature vector 1024 × 1 is transmitted to the full-link layer to be regarded as the output of the neurons in the first full-link layer, and the combined one-dimensional feature vector is densely connected to 30 neurons in the second full-link layer, so as to obtain the output of the neurons in the second full-link layer. And densely connecting the second layer neurons of the full connection layer with the 2 neurons of the third layer (namely the classified category number), and calculating the probability of each category by using a Softmax function to obtain a classification result.
The output of the second layer of neurons may be obtained by
Figure BDA0001935931970000211
Representing the output of the s-th neuron in the l-th network and the output of the t-th neuron in the l + 1-th network
Figure BDA0001935931970000212
Is formulated as:
Figure BDA0001935931970000221
wherein the content of the first and second substances,
Figure BDA0001935931970000222
and
Figure BDA0001935931970000223
representing weights and offsets of the neurons, which are updated with each training iteration. Further, f (·) denotes an activation function. In neural networks, common activation functions are Sigmoid, TanH, ReLU, and so on.
The neuron is understood to be the number of matrix pixels (which may also be referred to as matrix elements) in the input image feature; the number of neurons in each layer of the fully-connected layer can be determined according to matrix pixels in the trained compressed sample training data.
In order to counteract the problem of the dual recognition accuracy rate decreasing caused by introducing the anti-forensics method and enhance the display of the pixels in the empty space, the image detection method provided by the present application further includes:
before determining the image features of the intra-block feature images and the image features of the inter-block feature images, respectively performing filtering processing on the intra-block feature images and the inter-block feature images to obtain filtered intra-block feature images and inter-block feature images.
Since the filtering process can be implemented by a differential filter, as shown in fig. 7, the intra-block path and the inter-block path of the convolutional neural network provided in this embodiment further include a differential filter layer, which may also be referred to as an intra-block differential filter layer and an inter-block differential filter layer, respectively. The intra-block differential filter layer is located at an initial end of an intra-block path, and the inter-block differential filter layer is located at an initial end of an inter-block path.
The convolution operation using the "zero padding convolution" method, i.e. 2 convolution kernels with the size of 2 × 2 are respectively used to convolve each feature map, and the parameters of these 2 convolution kernels are shown in formula (5.7):
Figure BDA0001935931970000224
the differential filter layer can enhance the signal-to-noise ratio of the segmented intra-block characteristic images and inter-block characteristic images, further enhance information left in a space domain after JPEG compression and prevent traces left by evidence obtaining, and is beneficial to improving the detection accuracy of the double-compression image.
Step S404: and determining whether the image to be detected belongs to a double-compression image or not according to the identification and classification result.
The specific implementation process of step S404 is to calculate, according to the connection relationship between the one-dimensional feature vector after combination and the neurons in each network layer in the full connection layer through the Softmax classifier in step S403-e, a classification result obtained, where the classification result may be a two-dimensional vector (x, y), and x + y is 1; in this embodiment, x represents a probability that the image to be detected input by the network is a double-compressed image, and y represents a probability that the image to be detected input by the network is a single-compressed image, or it can be understood that a difference between the combined one-dimensional feature vector and a one-dimensional vector feature of the network layer of the full connection layer is used to determine whether the image to be detected belongs to double compression, where the difference may be a two-dimensional vector, and therefore, the determining process in step S404 may include:
comparing the single-compression classification probability value of the identification classification result of the image to be detected with the double-compression classification probability value, and if the double-compression classification probability value (double-compression classification difference) is greater than the single-compression classification probability value (single-compression classification difference), determining the obtained image to be detected as double-compression image information; otherwise, the image is a single compressed image.
Based on the above, before the processing based on the obtained image to be detected, the method may further include:
carrying out normalization pretreatment on the obtained image to be detected to obtain a normalized image to be detected;
based on the processing of the image to be detected, at least two types of characteristic images are obtained, including:
and segmenting the image to be detected after the normalization processing to obtain at least two types of characteristic images.
According to the fact that an input image to be detected IS a gray image IS, the pixel value range IS [0, 255], normalization preprocessing can be carried out on the image to be detected by adopting the following formula:
IS=IS/255
the above is a description of an embodiment of an image detection method provided in the present application. In correspondence with the embodiment of the image detection method provided in the foregoing, the present application further discloses an embodiment of an image detection apparatus, please refer to fig. 10, since the apparatus embodiment is substantially similar to the method embodiment, the description is relatively simple, and related points can be referred to the partial description of the method embodiment. The device embodiments described below are merely illustrative.
As shown in fig. 10, fig. 10 is a schematic structural diagram of an embodiment of an image detection apparatus provided by the present application, and the apparatus includes:
an acquiring unit 901, configured to acquire an image to be detected.
A processing unit 902, configured to obtain at least two types of feature images based on the processing on the image to be detected.
The processing unit 902 is specifically configured to segment the image to be detected to obtain at least two types of feature images, and may include:
the dividing subunit is used for dividing the acquired pixel matrix of the image to be detected to obtain image blocks;
the selecting subunit is used for selecting pixels of an area adjacent to the central position of the image block in the dividing subunit and selecting pixels of an area adjacent to the dividing intersection position of the image block;
the intra-block obtaining subunit is used for arranging and combining the pixels of the area adjacent to the central position of the image block according to the block dividing sequence of the image to be detected to obtain an intra-block characteristic image;
and the inter-block obtaining subunit is used for partitioning the image into blocks and dividing pixels of the adjacent areas at the intersection positions, and arranging and combining the pixels according to the partitioning sequence of the image to be detected to obtain the inter-block characteristic image.
And the type determining subunit is used for determining the intra-block characteristic image and the inter-block characteristic image as the obtained at least two types of characteristic images.
A classification recognition unit 903, configured to input information of the at least two types of feature images into a neural network with at least two paths for recognition, so as to obtain a recognition classification result;
the classification identifying unit 903 may include:
an intra-block image feature determination subunit configured to input the intra-block feature image into a convolution group with an intra-block path in the convolution neural network, and determine an image feature of the intra-block feature image;
an inter-block image feature determination subunit, configured to input the inter-block feature image into a convolution group with an inter-block path in the convolution neural network, and determine an image feature of the inter-block feature image;
a main image feature determination subunit, configured to input the image features of the intra-block feature image and the image features of the inter-block feature image to a dimension reduction layer at an end of an intra-block path and a dimension reduction layer at an end of an inter-block path, respectively, and determine main image features of the intra-block feature image and main image features of the inter-block feature image;
a feature vector obtaining subunit, configured to combine the main image features of the intra-block feature image and the main image features of the inter-block feature image to obtain a feature vector of the image to be detected;
and the classification result determining subunit is used for transmitting the feature vectors to a full connection layer of the convolutional neural network and determining the identification and classification result of the image to be detected.
The above is only a general description, and the specific process refers to the description of the image detection method, which is not repeated herein.
It is understood that, in order to resist the problem of the dual recognition accuracy rate being reduced by introducing the anti-forensic mode and to enhance the display of pixels in the space, the image device method provided by the present application further comprises:
and the filtering unit is used for respectively carrying out filtering processing on the intra-block characteristic image and the inter-block characteristic image before determining the image characteristics of the intra-block characteristic image and the image characteristics of the inter-block characteristic image to obtain the filtered intra-block characteristic image and inter-block characteristic image.
Since the filtering process can be implemented by a differential filter, as shown in fig. 7, the intra-block path and the inter-block path of the convolutional neural network provided in this embodiment further include a differential filter layer, which may also be referred to as an intra-block differential filter layer and an inter-block differential filter layer, respectively. The intra-block differential filter layer is located at an initial end of an intra-block path, and the inter-block differential filter layer is located at an initial end of an inter-block path.
The differential filter layer can enhance the signal-to-noise ratio of the segmented intra-block characteristic images and inter-block characteristic images, further enhance information left in a space domain after JPEG compression and prevent traces left by evidence obtaining, and is beneficial to improving the detection accuracy of the double-compression image.
A determining unit 904, configured to determine whether the image to be detected belongs to the dual compressed image information according to the recognition and classification result.
The determining unit 904 may include:
and the comparison determining subunit is used for comparing the single-compression classification probability value of the identification classification result of the image to be detected with the double-compression classification probability value, and if the double-compression classification probability value is greater than the single-compression classification probability value, determining the acquired image to be detected as double-compression image information.
Further, the image detection apparatus provided by the present application may further include:
and the preprocessing unit is used for carrying out normalization preprocessing on the acquired image to be detected to obtain a preprocessed image to be detected.
The obtaining unit 901 is specifically configured to perform segmentation according to the obtained preprocessed image to be detected, so as to obtain at least two types of feature images.
In view of the above, the present application also provides a method for training a neural network model, please refer to fig. 11, where fig. 11 shows a flowchart of an embodiment of the method for training a neural model provided in the present application, and the method for training the neural network model includes:
step S1001: training sample images are acquired.
The acquiring of the training sample image in step S1001 may be acquiring an image determined to be double-compressed and/or a single-compressed image, where the image may be a static JPEG image stored separately, a JPEG image stored in JPEG format by using a moving image, or a JPEG image obtained by cutting out data information such as a moving video or a moving image.
JPEG is an abbreviation of Joint Photographic Experts Group (Joint Photographic Experts Group), and the post-file dropping name is ". jpg" or ". JPEG", which is the most commonly used image file format, is established by a software development and association organization, and is a lossy compression format capable of compressing images in a small storage space.
Step S1002: and obtaining at least two types of training sample characteristic images based on the processing of the training sample images.
The specific implementation process of step S1002 may be to segment the training sample image to obtain at least two types of feature images. For a specific segmentation process, reference may be made to the description of step S402 in the image detection method, which is not described herein again.
Step S1003: and inputting the information of the at least two types of training sample characteristic images into a neural network with at least two paths for recognition to obtain recognition and classification results.
Similarly, for the specific implementation process of step S1003, reference may be made to the description of step S403 in the above image detection method, and details are not repeated here.
Step S1004: determining the weight of the convolutional neural network according to the recognition and classification result;
the specific implementation process of step S1004 may include:
step S1004-a: calculating according to the classification result as a classification label value and a real label value of the training sample image to obtain a loss value;
step S1004-b: determining a loss value as a weight of the convolutional neural network.
Wherein the step S1004-a may be calculated by the following formula:
Figure BDA0001935931970000261
wherein, PnIs to calculate an actual output probability result, qnIs the expected output result (i.e., the true label), w represents the weight of the network.
Step S1005: and updating the weight of the convolutional neural network according to the determined weight of the convolutional neural network to obtain a trained convolutional neural network model.
The specific implementation process of step S1005 may be to update the network weight in a back propagation manner, for example: given a learning rate α, the network weights in the tth training iteration are:
Figure BDA0001935931970000262
in the above-mentioned formula,
Figure BDA0001935931970000263
finger pair loss value L(w)The derivation is done with respect to the network weights w.
The above is a description of an embodiment of a training method for a convolutional neural network model provided in the present application. Corresponding to the embodiment of the training method of the convolutional neural network model provided above, the present application also discloses an embodiment of a training apparatus of the convolutional neural network model, please refer to fig. 12, since the apparatus embodiment is basically similar to the method embodiment, the description is relatively simple, and the relevant points can be referred to the partial description of the method embodiment. The device embodiments described below are merely illustrative.
An acquisition unit 1101 for acquiring a training sample image;
a processing unit 1102, configured to obtain at least two types of training sample feature images based on processing of the training sample images;
the identification unit 1103 is configured to input information of the at least two types of training sample feature images into a neural network with at least two paths for identification, so as to obtain an identification classification result;
a determining unit 1104, configured to determine a weight of the convolutional neural network according to the recognition and classification result;
an updating unit 1105, configured to update the weights of the convolutional neural network according to the determined weights of the convolutional neural network, so as to obtain a trained convolutional neural network model.
In the image detection method and the training method, the JPEG image to be detected is divided into 8 x 8 blocks through a division layer in a convolutional neural network, then pixels in the centers of the blocks are selected from the 8 x 8 blocks to serve as Intra-Block (Intra-Block) feature images of 4 x 4, and Inter-Block (Inter-Block) feature images of 4 x 4 at the cross positions (namely, every four adjacent JPEG image pixels) in the 8 x 8 blocks are selected. The two characteristic images are respectively used as the input of two paths of convolution structures in a convolution neural network to detect and learn the image characteristics of the deep level. The convolutional neural networks of the two paths related in the application can be used for mining the pixel characteristics of the inter-block characteristic images and the intra-block characteristic images by combining the JPEG compression characteristics, so that the identification of the JPEG double-compression characteristics is facilitated, and the attack to the convolutional neural networks caused by the hiding or removing of compression traces of double-compression images by anti-forensics is better resisted.
In addition, the signal-to-noise ratio of the image is enhanced through the intra-block difference filter and the inter-block difference filter, and information left in a space domain by JPEG compression and traces left by anti-evidence obtaining operation are enhanced, so that double JPEG compression detection and anti-evidence obtaining operation detection are facilitated.
Based on the above, the present application further provides a computer storage medium for storing a program for generating data by a network platform and processing the data generated by the network platform;
when read and executed by the processor, the program performs the following operations:
acquiring an image to be detected;
obtaining at least two types of characteristic images based on the processing of the image to be detected;
inputting the information of the at least two types of characteristic images into a neural network with at least two channels for recognition to obtain recognition and classification results;
and determining whether the image to be detected belongs to a double-compression image or not according to the identification and classification result.
The present application further provides an electronic device, comprising:
a processor;
a memory for storing a program for processing network platform generated data, the program when read and executed by the processor performing the following operations:
acquiring an image to be detected;
obtaining at least two types of characteristic images based on the processing of the image to be detected;
inputting the information of the at least two types of characteristic images into a neural network with at least two channels for recognition to obtain recognition and classification results;
and determining whether the image to be detected belongs to a double-compression image or not according to the identification and classification result.
In order to verify the effectiveness of the two-channel convolutional neural network (SB-CNN) for detecting double JPEG and anti-forensics operations, the two-channel convolutional neural network (SB-CNN) is compared with a related detection scheme, so that the technical effects of the two-channel convolutional neural network are more deeply understood.
The experiment mainly comprises two aspects of detection of double JPEG compression and double JPEG compression detection of an embedding anti-evidence obtaining technology; the method comprises the steps that a TensorFlow deep learning open source tool is used as an experiment platform, and a GPU computing card with the model number of NVidi Tesla-P100 is used for completing an experiment; a widely used image library BOSSBase v1.01 containing 10000 gray images with uncompressed PGM as storage format was used as the source of experimental images.
1. Dual JPEG compression detection
This section of experiment will still select 4 representative compression quality factor combinations as experimental objects to measure the performance of the network, i.e. (QF1, QF2) ═ (70, 75), (80, 75), (85, 70), (85, 90). The selection principle is as follows: the compression quality is moderately high, both with QF1> QF2 and with QF1< QF2, and with the same QF1 or QF2 in combination with different QF2 and QF 1.
Table 1 shows the performance of the two-pass convolutional neural network (SB-CNN) and other comparative methods provided herein with respect to dual JPEG compression detection in the above-described compression quality factor combinations. The DFSD is a method based on initial character, the DAH is a method based on statistical histogram, and the DFD-CNN and the DMD-CNN are neural network methods based on frequency domain and double domain respectively. The highest detected accuracy value in each compression quality factor combination in table 1 is shown in bold.
Table 1: double JPEG compression detection accuracy (%) comparison table
(QF1,QF2) (70,75) (80,75) (85,70) (85,90)
DFSD 99.22 99.36 97.70 99.98
DAH 98.73 99.02 98.99 99.91
DFD-CNN 98.74 99.00 97.36 99.96
DMD-CNN 95.78 99.04 98.54 99.99
DSB-CNN 98.70 97.58 99.49 99.82
From the above table, the following summary can be made:
the performance of the DSB-CNN is on a medium level, and most of the detection accuracy rates in Table 1 are more than 97% and even close to 100%, so that the method has strong and similar capabilities, and the detection accuracy rate has a limited space for improvement.
2. Dual JPEG compression detection with embedded anti-forensics
The detection performance of the convolutional neural network (SB-CNN) with two paths provided by the application on the double JPEG compressed image embedded with different types of anti-evidence obtaining technologies is tested respectively and compared with the traditional artificial feature method. The selected comparison methods include the method based on the initial character DFSD and the method based on the statistical histogram DAH.
(1) Application of anti-forensics techniques to scene classification
The SB-CNN has the capability of detecting the double JPEG compressed image attacked by the anti-forensics technology, and the specific anti-forensics technology scene is firstly divided to cover the conventional image anti-forensics technology.
The anti-forensic technique studies the distribution of DCT coefficients in JPEG compressed images and provides a deterrent to forensics by fitting the distribution to the DCT coefficient distribution of uncompressed images. Then, a comprehensive anti-evidence obtaining method is proposed according to the blocking effect of the JPEG image. Based on the optimal balance between the non-detectability and the anti-evidence obtaining performance, a four-step JPEG compression trace eliminating method is provided. The main characteristics of the above methods are that the DCT coefficients are first re-fitted, and then the empty domain is repaired to erase the compression traces of the JPEG image. As shown in fig. 13, fig. 13 is a schematic diagram of a common scenario of embedding anti-forensics technology in a double JPEG compression process.
The images generated according to the scene mapping provided in fig. 13 are represented by JPEG (1) A, JPEG (2) B0, and the like, specifically:
(1) JPEG (1) A: JPEG is used for compressing the image once, and the quantization quality factor is QF;
(2) JPEG (2) B0: natural dual JPEG compressed image with first and second quantization quality factors of QF1 and QF 2;
(3) JPEG (2) B1: the anti-evidence double JPEG compressed image is generated by adding attacks to the airspace when JPEG decompresses and retracts data of the airspace, wherein the attacks comprise contrast enhancement, image size scaling, median filtering and the like;
(4) JPEG (2) B2: and carrying out anti-evidence operation on DCT coefficients of the JPEG image, and then decompressing and retracting the airspace. Such anti-forensic methods are representative;
(5) JPEG (2) B3: and (3) performing anti-evidence operation on the double JPEG compressed image in a DCT domain of the JPEG compressed image and a spatial domain of the JPEG decompressed image respectively.
(2) Airspace anti-forensics image:
first, a double JPEG compressed image in which the anti-forensic operation is performed only in the JPEG decompressed image spatial domain, i.e., the third type image JPGB1 in fig. 13, is detected. Taking the compression quality factor combination (QF1, QF2) ═ 70, 75 as an example, the attack image is generated by the following general method:
contrast Enhancement (Contrast Enhancement), the luminance parameters are (0.5, 0.6., 1.4, 1.0 excluded) respectively;
filtering a Median (media Filtering), wherein the size of a Filtering window is s equal to 3 × 3 and 2 × 2, and the variance of gaussian white noise is v equal to 2 and 3;
image size scaling (Resize) and then return to original size: scaling was (except 0.7, 0.8,. 1.3, 1.0);
fourthly, after the image size is zoomed, the original size is not returned: the scaling was (1.1, 1.2, 1.3, 1.4). All the images are generated by MATLAB software, the used commands are 'imjust', 'medfilt' and 'nonresizing', and the using time points of the commands correspond to the 'airspace evidence-removing' words of the generation process of the third type JPGB1 in FIG. 13.
Referring to fig. 14, fig. 14 is a schematic diagram illustrating the detection effect of fig. 1 for an image processed by an anti-forensic method. For convenience of representation, the image type is represented by english detection plus parameters, and is exemplified as follows:
"CE — 0.5" represents contrast enhancement operation using a luminance parameter of 0.5;
"MFG _ s3v 2" indicates that the window size of median filtering is 3 × 3 and the noise method is 2;
thirdly, the RSR _0.8 represents that the image scaling is 0.8 and returns to the original size;
'RSC _ 1.2' indicates that the image magnification ratio is 1.2 and the original size is not returned;
'S/D' represents a natural double JPEG compressed picture.
In addition, the detection methods are DFSD, DAH and DSB-CNN in the present application, respectively.
As can be observed from fig. 14, the detection accuracy of the first three detection methods to natural dual JPEG compression (S/D) is close to 100%, and after the spatial domain inversion evidence obtaining technique is implemented, the three detection methods all show a substantially similar descending trend, but the descending amplitudes are different greatly. The first letter characteristic method represented by the blue round block curve and the statistical histogram method represented by the black square have larger descending amplitude, the network (red triangle block curve) reflects higher detection accuracy and stability, the network is basically kept in a higher accuracy state, and for example, on the detection of image types 'CE _ 1.4' and 'RSC _ 1.1', the accuracy of the double-channel convolutional neural network is improved by 11% -33%, and the strong capacity of resisting airspace anti-evidence technology is displayed.
(3) Comprehensive anti-forensic images:
the spatial domain anti-forensic image is subjected to anti-forensic operation in a spatial domain before the second JPEG compression, and only the blocking effect characteristic of some spatial domains can be covered, and the removal of a compression trace is not carried out according to the characteristic of the JPEG image. In the comprehensive anti-evidence obtaining technology, on the basis of establishing a mathematical model, the coefficient distribution difference between a compressed image and an uncompressed image is more accurately made up respectively for the DCT domain of the JPEG image or simultaneously for the DCT domain and the spatial domain.
Table 2 shows the performance of the same three methods for detecting the double JPEG compression with the comprehensive anti-forensics technique embedded therein, wherein AStamm, AFan, and ASingh are anti-forensics methods, respectively. As can be seen from table 2, in most cases, the methods based on the conventional artificial features show poor performance in the detection against the anti-forensics technique, because they are based on the features of the DCT coefficients of the JPEG image, and these features are erased to some extent by the anti-forensics method. The dual-channel convolutional neural network SB-CNN can stably keep a high detection rate (the result is all over 90%), the detection accuracy under some conditions is even improved by 45% compared with that of an artificial characteristic method, and the SB-CNN designed based on the spatial domain characteristics of the JPEG image has good detection performance on dual JPEG compression and can well resist attack of an anti-evidence-obtaining technology.
Table 2: double JPEG compression detection accuracy (%) comparison table embedded with anti-evidence obtaining technology
Figure BDA0001935931970000311
Compared with the prior art, the image detection method and the related dual-channel convolutional neural network have higher detection accuracy and better performance in detection of normal images or images subjected to anti-forensics operation.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
1. Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.
2. As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Although the present application has been described with reference to the preferred embodiments, it is not intended to limit the present application, and any person skilled in the art can make variations and modifications without departing from the spirit and scope of the present application, therefore, the scope of the present application should be limited by the scope of the claims.

Claims (24)

1. An image detection method, comprising:
acquiring an image to be detected;
based on the processing of the image to be detected, at least two types of characteristic images are obtained, including: segmenting the image to be detected to obtain at least two types of characteristic images;
inputting the information of the at least two types of characteristic images into a neural network with at least two channels for recognition to obtain recognition and classification results;
determining whether the image to be detected belongs to a double-compression image or not according to the identification and classification result;
wherein, to wait to detect the image and cut apart, obtain two at least kinds of characteristic images, include:
dividing the obtained pixel matrix of the image to be detected to obtain image blocks;
selecting pixels of a region adjacent to the central position of the image block and pixels of a region adjacent to the segmentation intersection position of the image block;
arranging and combining pixels in an area adjacent to the central position of the image block according to the block dividing sequence of the image to be detected to obtain an intra-block characteristic image;
partitioning the image into blocks, dividing pixels in the adjacent area of the intersection position, and arranging and combining according to the partitioning sequence of the image to be detected to obtain an inter-block characteristic image;
and determining the intra-block characteristic image and the inter-block characteristic image as at least two types of acquired characteristic images.
2. The image detection method according to claim 1, wherein the dividing the obtained pixel matrix of the image to be detected to obtain image blocks comprises:
and dividing the image to be detected from left to right and from top to bottom to obtain image blocks.
3. The image detection method according to claim 1, wherein inputting the information of the at least two types of feature images into a neural network having at least two paths for recognition to obtain recognition classification results comprises:
inputting the intra-block feature image into a convolution group of intra-block paths in the neural network, and determining an image feature of the intra-block feature image;
inputting the inter-block feature images into convolution groups of inter-block paths in the neural network, and determining image features of the inter-block feature images;
inputting the image features of the intra-block feature images and the image features of the inter-block feature images into a dimension reduction layer at the tail end of an intra-block access and a dimension reduction layer at the tail end of an inter-block access respectively, and determining the main image features of the intra-block feature images and the main image features of the inter-block feature images;
combining the main image features of the intra-block feature images and the main image features of the inter-block feature images to obtain the feature vectors of the images to be detected;
and transmitting the characteristic vectors to a full connection layer of the neural network, and determining the identification and classification result of the image to be detected.
4. The image detection method according to claim 3, further comprising:
before determining the image features of the intra-block feature images and the image features of the inter-block feature images, respectively performing filtering processing on the intra-block feature images and the inter-block feature images to obtain filtered intra-block feature images and inter-block feature images.
5. The image detection method according to claim 3, wherein the inputting the intra-block feature image to a convolution group of intra-block paths in the neural network, determining the image feature of the intra-block feature image, includes:
inputting the intra-block feature image into a convolution group of an intra-block passage in the neural network for convolution processing to obtain a processed intra-block feature image;
inputting the processed intra-block characteristic image into a pooling layer for processing to obtain the image characteristics of the intra-block characteristic image;
the inputting the inter-block feature image into a convolution group of an inter-block path in the neural network, extracting an image feature of the inter-block feature image, including:
inputting the inter-block feature images into a convolution group of an inter-block path in the neural network for convolution processing to obtain processed inter-block feature images;
inputting the processed inter-block feature images into a pooling layer for processing to obtain image features of the inter-block feature images;
wherein the number of convolution groups of the intra-block path is at least two, and the pooling layer of the intra-block path is located between the convolution groups of the intra-block path; the number of the convolution groups of the inter-block paths is at least two, and the pooling layer of the inter-block paths is positioned between the convolution groups of the inter-block paths.
6. The image detection method according to claim 3, wherein the determining the main image feature of the intra-block feature image and the main image feature of the inter-block feature image by inputting the image feature of the intra-block feature image and the image feature of the inter-block feature image to a dimension reduction layer at an end of an intra-block path and a dimension reduction layer at an end of an inter-block path, respectively, comprises:
pooling image features of the intra-block feature images in the intra-block passage to obtain pooled intra-block feature images;
performing convolution processing on the pooled intra-block characteristic images to obtain main image characteristics of the intra-block characteristic images;
performing convolution processing on the image characteristics of the inter-block characteristic images in the inter-block passage to obtain convolved inter-block characteristic images;
and performing pooling processing on the convolved inter-block feature images to obtain main image features of the inter-block feature images.
7. The image detection method according to claim 3, wherein merging the main image features of the intra-block feature image and the main image features of the inter-block feature image to obtain the feature vector of the image to be detected comprises:
respectively converting the main image features of the intra-block feature images and the main image features of the inter-block feature images into one-dimensional feature vectors to obtain intra-block one-dimensional feature vectors and inter-block one-dimensional feature vectors;
and combining the intra-block one-dimensional characteristic vector and the inter-block one-dimensional characteristic vector according to the structure sequence of the corresponding intra-block passage and inter-block passage respectively to obtain the characteristic vector of the image to be detected.
8. The image detection method according to claim 1, wherein said determining whether the acquired image to be detected belongs to a double-compressed image according to the recognition and classification result comprises:
and comparing the single-compression classification probability value of the identification classification result of the image to be detected with the double-compression classification probability value, and if the double-compression classification probability value is greater than the single-compression classification probability value, determining the acquired image to be detected as a double-compression image.
9. The image detection method according to claim 1, further comprising:
carrying out normalization pretreatment on the obtained image to be detected to obtain a normalized image to be detected;
based on the processing of the image to be detected, at least two types of characteristic images are obtained, including:
and segmenting the image to be detected after the normalization processing to obtain at least two types of characteristic images.
10. The image detection method of claim 1, wherein the neural network having at least two paths is a convolutional neural network.
11. An image detection apparatus, characterized by comprising:
the acquisition unit is used for acquiring an image to be detected;
the processing unit is used for obtaining at least two types of characteristic images based on the processing of the image to be detected, and comprises: segmenting the image to be detected to obtain at least two types of characteristic images;
the classification and identification unit is used for inputting the information of the at least two types of characteristic images into a neural network with at least two channels for identification to obtain an identification and classification result;
the determining unit is used for determining whether the image to be detected belongs to double-compression image information or not according to the identification and classification result;
wherein, to wait to detect the image and cut apart, obtain two at least kinds of characteristic images, include:
dividing the obtained pixel matrix of the image to be detected to obtain image blocks;
selecting pixels of a region adjacent to the central position of the image block and pixels of a region adjacent to the segmentation intersection position of the image block;
arranging and combining pixels in an area adjacent to the central position of the image block according to the block dividing sequence of the image to be detected to obtain an intra-block characteristic image;
partitioning the image into blocks, dividing pixels in the adjacent area of the intersection position, and arranging and combining according to the partitioning sequence of the image to be detected to obtain an inter-block characteristic image;
and determining the intra-block characteristic image and the inter-block characteristic image as at least two types of acquired characteristic images.
12. A training method of a neural network model is characterized by comprising the following steps:
acquiring a training sample image;
obtaining at least two types of training sample feature images based on the processing of the training sample images, including: segmenting the training sample image to obtain at least two types of training sample characteristic images;
inputting the at least two types of training sample characteristic images into a neural network with at least two channels for recognition to obtain recognition and classification results;
determining the weight of the neural network according to the recognition and classification result;
updating the weight of the neural network according to the determined weight of the neural network to obtain a trained neural network model;
the segmenting the training sample image to obtain at least two types of training sample characteristic images includes:
dividing the pixel matrix of the obtained training sample image to obtain training sample image blocks;
selecting pixels of a region adjacent to the center position of the training sample image blocks and selecting pixels of a region adjacent to the segmentation intersection position of the training sample image blocks;
pixels of an area adjacent to the center of each block of the training sample image are arranged and combined according to the block dividing sequence of the training sample image to obtain a characteristic image of the training sample in the block;
segmenting pixels of adjacent areas at the intersection positions of the training sample images in a blocking manner, and arranging and combining the pixels according to the segmentation sequence of the training sample images in the blocking manner to obtain inter-block training sample characteristic images;
the intra-block training sample characteristic images and the inter-block training sample characteristic images are at least two types of obtained training sample characteristic images.
13. The method for training a neural network model according to claim 12, wherein the dividing the pixel matrix of the obtained training sample image to obtain the training sample image blocks comprises:
and dividing the training sample image from left to right and from top to bottom to obtain training sample image blocks.
14. The method for training the neural network model according to claim 13, wherein the inputting the at least two types of training sample feature images into a neural network having at least two paths for recognition and obtaining a recognition classification result comprises:
inputting the intra-block training sample feature image into a convolution group of an intra-block path in the neural network, and determining an image feature of the intra-block training sample feature image;
inputting the inter-block training sample feature images into a convolution group of an inter-block path in the neural network, and determining image features of the inter-block training sample feature images;
respectively inputting the image features of the intra-block training sample feature images and the image features of the inter-block training sample feature images into a dimensionality reduction layer at the tail end of an intra-block access and a dimensionality reduction layer at the tail end of an inter-block access, and determining the main image features of the intra-block training sample feature images and the main image features of the inter-block training sample feature images;
combining the main image features of the intra-block training sample feature images and the main image features of the inter-block training sample feature images to obtain feature vectors of the training sample images;
and transmitting the feature vectors to a full connection layer of the neural network, and determining the recognition classification result of the training sample image.
15. The method of training a neural network model of claim 14, further comprising:
before determining the image features of the intra-block training sample feature images and the image features of the inter-block training sample feature images, respectively performing filtering processing on the intra-block training sample feature images and the inter-block training sample feature images to obtain filtered intra-block training sample feature images and inter-block training sample feature images.
16. The method for training a neural network model according to claim 14, wherein the inputting the intra-block training sample feature images into the convolution groups of intra-block paths in the neural network, and determining the image features of the intra-block training sample feature images, comprises:
inputting the intra-block training sample characteristic image into a convolution group of an intra-block passage in the neural network for convolution processing to obtain a processed intra-block training sample characteristic image;
inputting the processed intra-block training sample characteristic image into a pooling layer for processing to obtain the image characteristics of the intra-block training sample characteristic image;
the inputting the inter-block training sample feature image into a convolution group of an inter-block path in the neural network, and extracting the image feature of the inter-block training sample feature image includes:
inputting the inter-block training sample characteristic image into a convolution group of an inter-block passage in the neural network for convolution processing to obtain a processed inter-block training sample characteristic image;
inputting the processed inter-block training sample characteristic images into a pooling layer for processing to obtain image characteristics of the inter-block training sample characteristic images;
wherein the number of convolution groups of the intra-block path is at least two, and the pooling layer of the intra-block path is located between the convolution groups of the intra-block path; the number of the convolution groups of the inter-block paths is at least two, and the pooling layer of the inter-block paths is positioned between the convolution groups of the inter-block paths.
17. The method for training a neural network model according to claim 14, wherein the determining the dominant image features of the intra-block training sample feature images and the dominant image features of the inter-block training sample feature images by inputting the image features of the intra-block training sample feature images and the image features of the inter-block training sample feature images to a dimension reduction layer at an end of an intra-block path and a dimension reduction layer at an end of an inter-block path, respectively, comprises:
pooling image features of intra-block training sample feature images in the intra-block passage to obtain pooled intra-block training sample feature images;
performing convolution processing on the pooled intra-block training sample characteristic images to obtain main image characteristics of the intra-block training sample characteristic images;
performing convolution processing on the image characteristics of the inter-block training sample characteristic images in the inter-block passage to obtain the convolved inter-block training sample characteristic images;
and performing pooling treatment on the convolved feature images of the training samples among the blocks to obtain main image features of the feature images of the training samples among the blocks.
18. The method for training a neural network model according to claim 14, wherein the combining the main image features of the intra-block training sample feature images and the main image features of the inter-block training sample feature images to obtain the feature vectors of the training sample images comprises:
respectively converting the main image features of the training sample feature images in the blocks and the main image features of the training sample feature images among the blocks into one-dimensional feature vectors to obtain one-dimensional feature vectors in the blocks and one-dimensional feature vectors among the blocks;
and combining the intra-block one-dimensional characteristic vector and the inter-block one-dimensional characteristic vector according to the structural sequence of the corresponding intra-block access and inter-block access respectively to obtain the characteristic vector of the training sample image.
19. The method of training a neural network model of claim 12, further comprising:
carrying out normalization pretreatment on the obtained training sample image to obtain a training sample image after normalization treatment;
the obtaining at least two types of training sample feature images based on the processing of the training sample images comprises:
and segmenting the training sample image after the normalization processing to obtain at least two types of training sample characteristic images.
20. The method for training a neural network model according to claim 12, wherein the determining weights of the neural network according to the classification result comprises:
calculating according to the classification result as a classification label value and a real label value of the training sample image to obtain a loss value;
determining a loss value as a weight of the neural network.
21. The method for training the neural network model according to claim 12 or 20, wherein the updating the weights of the neural network according to the determined weights of the neural network to obtain the trained neural network model comprises:
and updating the old weight of the neural network by taking the determined weight of the neural network as a new weight in a back propagation mode to obtain the trained neural network model with the double channels.
22. An apparatus for training a neural network model, comprising:
an acquisition unit for acquiring a training sample image;
the processing unit is used for obtaining at least two types of training sample characteristic images based on the processing of the training sample images, and comprises: segmenting the training sample image to obtain at least two types of training sample characteristic images;
the recognition unit is used for inputting the information of the characteristic images of the at least two types of training samples into a neural network with at least two channels for recognition to obtain recognition and classification results;
the determining unit is used for determining the weight of the neural network according to the recognition and classification result;
the updating unit is used for updating the weight of the neural network according to the determined weight of the neural network to obtain a trained neural network model;
the segmenting the training sample image to obtain at least two types of training sample characteristic images includes:
dividing the pixel matrix of the obtained training sample image to obtain training sample image blocks;
selecting pixels of a region adjacent to the center position of the training sample image blocks and selecting pixels of a region adjacent to the segmentation intersection position of the training sample image blocks;
pixels of an area adjacent to the center of each training sample image block are arranged and combined according to the block dividing sequence of the training sample images to obtain a feature image of the training sample in each block;
segmenting pixels of adjacent areas at the intersection positions of the training sample images in a blocking manner, and arranging and combining the pixels according to the segmentation sequence of the training sample images in the blocking manner to obtain inter-block training sample characteristic images;
the intra-block training sample characteristic images and the inter-block training sample characteristic images are at least two types of obtained training sample characteristic images.
23. A computer storage medium for storing network platform generated data and a program for processing the network platform generated data;
when read and executed by the processor, the program performs the following operations:
acquiring an image to be detected;
based on the processing of the image to be detected, at least two types of characteristic images are obtained, including: segmenting the image to be detected to obtain at least two types of characteristic images;
inputting the information of the at least two types of characteristic images into a neural network with at least two channels for recognition to obtain recognition and classification results;
determining whether the image to be detected belongs to a double-compression image or not according to the identification and classification result;
wherein, to wait to detect the image and cut apart, obtain two at least kinds of characteristic images, include:
dividing the obtained pixel matrix of the image to be detected to obtain image blocks;
selecting pixels of a region adjacent to the central position of the image block and pixels of a region adjacent to the segmentation intersection position of the image block;
arranging and combining pixels in an area adjacent to the central position of the image block according to the block dividing sequence of the image to be detected to obtain an intra-block characteristic image;
dividing the image into blocks, and arranging and combining the pixels in the adjacent areas at the intersection positions according to the block division sequence of the image to be detected to obtain an inter-block characteristic image;
and determining the intra-block characteristic image and the inter-block characteristic image as at least two types of acquired characteristic images.
24. An electronic device, comprising:
a processor;
a memory for storing a program for processing network platform generated data, the program when read and executed by the processor performing the following operations:
acquiring an image to be detected;
based on the processing of the image to be detected, at least two types of characteristic images are obtained, including: segmenting the image to be detected to obtain at least two types of characteristic images;
inputting the information of the at least two types of characteristic images into a neural network with at least two channels for recognition to obtain recognition and classification results;
determining whether the image to be detected belongs to a double-compression image or not according to the identification and classification result;
wherein, to wait to detect the image and cut apart, obtain two at least kinds of characteristic images, include:
dividing the obtained pixel matrix of the image to be detected to obtain image blocks;
selecting pixels of a region adjacent to the central position of the image block and pixels of a region adjacent to the segmentation intersection position of the image block;
arranging and combining pixels in an area adjacent to the central position of the image block according to the block dividing sequence of the image to be detected to obtain an intra-block characteristic image;
partitioning the image into blocks, dividing pixels in the adjacent area of the intersection position, and arranging and combining according to the partitioning sequence of the image to be detected to obtain an inter-block characteristic image;
and determining the intra-block characteristic image and the inter-block characteristic image as at least two types of acquired characteristic images.
CN201910007257.2A 2019-01-04 2019-01-04 Image detection method and device and neural network training method and device Active CN111415323B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910007257.2A CN111415323B (en) 2019-01-04 2019-01-04 Image detection method and device and neural network training method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910007257.2A CN111415323B (en) 2019-01-04 2019-01-04 Image detection method and device and neural network training method and device

Publications (2)

Publication Number Publication Date
CN111415323A CN111415323A (en) 2020-07-14
CN111415323B true CN111415323B (en) 2022-05-27

Family

ID=71493952

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910007257.2A Active CN111415323B (en) 2019-01-04 2019-01-04 Image detection method and device and neural network training method and device

Country Status (1)

Country Link
CN (1) CN111415323B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113034628B (en) * 2021-04-29 2023-09-26 南京信息工程大学 Color image JPEG2000 recompression detection method
CN113538387B (en) * 2021-07-23 2024-04-05 广东电网有限责任公司 Multi-scale inspection image identification method and device based on deep convolutional neural network
CN116347080B (en) * 2023-03-27 2023-10-31 苏州利博特信息科技有限公司 Intelligent algorithm application system and method based on downsampling processing
CN116777375B (en) * 2023-06-20 2024-02-23 苏州智本信息科技有限公司 Industrial Internet system based on machine vision

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102413328A (en) * 2011-11-11 2012-04-11 中国科学院深圳先进技术研究院 Double compression detection method and system of joint photographic experts group (JPEG) image
CN103413336A (en) * 2013-07-31 2013-11-27 中国科学院深圳先进技术研究院 Grid non-aligned double-JPEG-compression detecting method and device
CN107679572A (en) * 2017-09-29 2018-02-09 深圳大学 A kind of image discriminating method, storage device and mobile terminal

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8260067B2 (en) * 2008-04-18 2012-09-04 New Jersey Institute Of Technology Detection technique for digitally altered images

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102413328A (en) * 2011-11-11 2012-04-11 中国科学院深圳先进技术研究院 Double compression detection method and system of joint photographic experts group (JPEG) image
CN103413336A (en) * 2013-07-31 2013-11-27 中国科学院深圳先进技术研究院 Grid non-aligned double-JPEG-compression detecting method and device
CN107679572A (en) * 2017-09-29 2018-02-09 深圳大学 A kind of image discriminating method, storage device and mobile terminal

Also Published As

Publication number Publication date
CN111415323A (en) 2020-07-14

Similar Documents

Publication Publication Date Title
Ilesanmi et al. Methods for image denoising using convolutional neural network: a review
CN111415323B (en) Image detection method and device and neural network training method and device
Pan et al. Fast vision transformers with hilo attention
CN109325550B (en) No-reference image quality evaluation method based on image entropy
Zhang et al. Adaptive residual networks for high-quality image restoration
CN107316013B (en) Hyperspectral image classification method based on NSCT (non-subsampled Contourlet transform) and DCNN (data-to-neural network)
CN109344618B (en) Malicious code classification method based on deep forest
Liu et al. Locating splicing forgery by fully convolutional networks and conditional random field
CN110111256B (en) Image super-resolution reconstruction method based on residual distillation network
CN107977661B (en) Region-of-interest detection method based on FCN and low-rank sparse decomposition
CN111127387B (en) Quality evaluation method for reference-free image
CN109191418B (en) Remote sensing image change detection method based on feature learning of contraction self-encoder
CN108830829B (en) Non-reference quality evaluation algorithm combining multiple edge detection operators
CN115293966A (en) Face image reconstruction method and device and storage medium
Chi et al. Enhancing textural differences using wavelet-based texture characteristics morphological component analysis: A preprocessing method for improving image segmentation
Singh et al. Amalgamation of ROAD-TGM and progressive PCA using performance booster method for detail persevering image denoising
Shao et al. Generative image inpainting with salient prior and relative total variation
Guo et al. Content-aware convolutional neural networks
Hussain et al. Image denoising to enhance character recognition using deep learning
CN113869234A (en) Facial expression recognition method, device, equipment and storage medium
Luo et al. Piecewise linear regression-based single image super-resolution via Hadamard transform
Elashry et al. Feature matching enhancement using the graph neural network (gnn-ransac)
CN113344110B (en) Fuzzy image classification method based on super-resolution reconstruction
CN114863132A (en) Method, system, equipment and storage medium for modeling and capturing image spatial domain information
CN114677530A (en) Clustering algorithm effectiveness evaluation method, device and medium based on wavelet shape descriptor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant