CN110659679B

CN110659679B - Image source identification method based on adaptive filtering and coupling coding

Info

Publication number: CN110659679B
Application number: CN201910871685.XA
Authority: CN
Inventors: 赵梦楠; 王波
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2019-09-16
Filing date: 2019-09-16
Publication date: 2022-02-11
Anticipated expiration: 2039-09-16
Also published as: CN110659679A

Abstract

An image source identification method based on adaptive filtering and coupling coding belongs to the technical field of computer image processing. The technical scheme is as follows: extracting original features of a camera, extracting content features of an image through multilayer convolution, and adaptively removing the content features from the original features to obtain attribute information convenient to classify; the categories are regressed progressively in a multi-task training mode, and a single model is used for classifying the brand, the model and the equipment of the camera at the same time; the coupled coding method is adopted to ensure that the sub-classifier can act against the parent classifier, so that the accuracy of the parent classifier is improved; by properly increasing the redundancy of the encoding method, the classified model is used as a pre-training model for classifying the new camera class, so that the training time can be greatly reduced. The method has the advantages that the novel coding method is adopted, the coupling performance among the three types of the cameras is introduced, the three types of the cameras are mutually promoted, and the accuracy rate of identifying the type source of the cameras is improved.

Description

Image source identification method based on adaptive filtering and coupling coding

Technical Field

The invention belongs to the technical field of computer image processing, and particularly relates to an image source identification method based on adaptive filtering and coupling coding.

Background

With the rapid development of multimedia technology, digital images become an important way for people to express ideas. Digital images have important applications in many situations, for example as evidence essential in criminal investigations. In order to ensure the reliability of the picture, the source of the picture needs to be identified. With the popularization of various powerful image editing software, it becomes easier to modify digital images. This can cause a number of problems to some extent, for example, affecting justice. Thus, digital image forensics requires two stages. The picture needs to be tampered and falsified first, and then the reliability source of the camera picture is determined through further analysis. When the picture is a tampered picture obtained through a series of operations, it is not suitable for being used as the reliability evidence. Then, many blind forensics algorithms are proposed to identify the source of the picture. The essence of the camera forensics method is to detect the camera attribute difference, and as shown in the figure I, the camera imaging needs to go through a series of operations. For each step of imaging, the existing methods propose corresponding solutions to classify the pictures.

In the past years, the camera forensic performance has improved dramatically. The powerful learning capability of the convolutional neural network can automatically learn the differences existing in the classified pictures. But the performance of the classifier is strongly dependent on the number of pictures taken by the training set. Increasing the number of training pictures increases training time while increasing accuracy. When the data amount is sufficient, the influence of increasing the data amount on improving the classification accuracy is small. At the same time, the performance of convolutional neural networks increases as the depth of the network increases. Too many network layers may result in over-trained adaptation. Limited by the storage capacity of the hardware device, the original picture taken by the camera is difficult to be directly used as the input of the convolutional neural network, and the original picture can generate excessive parameter quantity. Therefore, the conventional deep learning method first performs fixed-size cropping on the image. The most common crop sizes are 64x64,128x128, 256x256.

Camera source forensics includes forensics of the make, model, and device of the camera. Despite the great improvements made by existing camera forensic methods, there are still some problems that need to be improved. First, when classifying images, existing methods mostly only classify a single class, e.g., model of camera alone. DING et al proposed a way of multi-tasking training based on convolutional neural networks to classify three categories of cameras simultaneously. The experiment training set adopts a Dresden database and a DING to classify brands, models and equipment in sequence. Although the classifier has high classification accuracy on brands, the classification accuracy on models and equipment still needs to be improved. For the source classification of the camera, all existing methods do not consider the coupling between the three categories. Then, when the classification attribute of the camera is extracted, it is largely influenced by the picture content. For example, when a camera is taking in different environments, a convolutional neural network may classify a picture for texture differences of the picture, rather than attribute differences. Therefore, when classifying images, preprocessing of pictures is necessary. Tula et al removes the content of the picture using a high pass filter and a wavelet based de-noising filter. However, using a fixed form of high-pass filter may remove the original attribute information of the camera.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides an image source identification method based on adaptive filtering and coupling coding, which adopts a residual error learning mode and uses multilayer convolution kernels to extract the content of an image; and the accuracy of camera source identification is improved by adopting a category coupling type multi-task training method based on residual errors.

The technical scheme is as follows:

an image source identification method based on adaptive filtering and coupling coding comprises the following steps:

s1, extracting image features through a layer of convolution, and then extracting image contents through a plurality of layers of convolution kernels;

s2, removing the extracted image content from the image characteristics to obtain camera attribute characteristics;

s3, extracting correlation characteristics between image neighborhoods by using a multi-layer RESNET structure;

and S4, classifying the brand, model and equipment of the camera by adopting a multi-task classifier according to the extracted correlation features and the camera attribute features.

Further, the multitask classifier adopts coupled coding as follows:

t1, input picture size and label, wherein each classification category label is extracted and recorded

B_label＝[1,…,1,0,…0,0,…,0]，

M_label＝[1,…,1,1,…1,0,…,0]，

D_label＝[1,…,1,1,…1,1,…,1]。

Wherein: b _ label represents a brand target label, M _ label represents a model target label, and D _ label represents an individual target label;

t2, generating category labels using the fully connected layers from the extracted features:

conv_1,conv_2,conv_3＝Conv(Att)，

Label₁＝Softmax[FC(Conv_1)]

Label₂＝2×Softmax[FC(Conv_2)]-Label₁

Label₃＝3×Softmax[FC(Conv_3)]-Label₂-Label₁

wherein: conv _1 represents a feature spectrum of a neural network for classifying brands, conv _2 represents a feature spectrum for classifying brands and models, conv _3 represents a feature spectrum for classifying brands, models and devices, Att represents correlation characteristics and camera attribute information, Label₁Representing brand Label classified by actual classifier₂Indicating model Label, Label₃Representing an individual tag;

t3, constructing a cost function for the obtained category label:

Classify2(logits,label)＝-Sum(label×log(logits))

Cost_b＝Classify2[(Label₁),Label&B_label]

Cost_m＝Classify2[(Label₂),Label&M_label]

Cost_d＝Classify2[(Label₃),Label&D_label]

wherein: cost _ b, Cost _ m and Cost _ d respectively represent Cost functions for optimizing brands, models and individuals under supervision of actual labels, Label represents the actual labels, and logits represent logic values.

Further, the method also comprises the following steps:

t4 construction of a coupling cost function

Cost_1＝L₁(Label₂[:b],Label₁[:b])；

Cost_2＝L₁(Label₃[:b+m],Label₂[:b+m])；

Cost＝α×Cost_b+β×Cost_m+χ×Cost_d+δ×Cost_1+ε×Cost_2

Wherein α + β + χ ═ 1, δ ═ 1, and ε ═ 1; cost _1 is used for improving the coupling between a brand classifier and a model classifier, b represents a code length for encoding a brand, Cost _2 is used for improving the coupling between the brand classifier and the model classifier, and L₁Indicating L1 regular, m indicating the code length for coding the model, Cost indicating the total Cost function obtained by setting different super parameters;

t5, constructing a coding length L, wherein L > N, and N is the total coding length required by the database;

and T6, constructing a position classifier to extract the local position information of the picture, and classifying the picture by using the local position information as a feature.

The invention has the beneficial effects that:

the image source identification method based on the adaptive filtering and the coupling coding adopts a residual error learning mode, and multilayer convolution kernels are used for removing the content of an image so as to obtain the attribute information of a camera; the accuracy rate of camera source identification is improved by adopting a category coupling type multi-task training method based on residual errors; low or high frequency content that is not relevant to classification can be selectively removed by combining the output of each layer of convolution kernel and then using 1x1 convolution kernel; the attribute information of the image is accurately extracted by using self-adaptive filtering, and the coupling of the network to the three camera categories is increased by adopting coupled coding, so that the classification accuracy of the cameras is improved.

Drawings

FIG. 1 is a block diagram of the overall architecture including the construction of an adaptive filter and a multitask training architecture;

FIG. 2 is a schematic diagram of an auxiliary classifier;

FIG. 3 is a diagram illustrating the correlation of label cost functions for each class.

Detailed Description

The image source identification method of adaptive filtering and coupling coding is further described with reference to fig. 1-3.

example 1

s2, removing the extracted image content from the image characteristics to obtain the attribute information of the camera;

and S4, classifying the brand, model and equipment of the camera by adopting a multi-task classifier according to the extracted correlation characteristics and the camera attribute information.

The coupling type coding adopted by the multi-task classifier is as follows:

B_label＝[1,…,1,0,…0,0,…,0]，

M_label＝[0,…,0,1,…1,0,…,0]，

D_label＝[0,…,0,0,…0,1,…,1]；

conv_1,conv_2,conv_3＝Conv(Att)，

Label₁＝FC(Conv_1)，

Label₂＝FC(Conv_2)-Label₁，

Label₃＝FC(Conv_3)-Label₂-Label₁，

wherein: conv _1 represents a feature spectrum of a neural network for classifying brands, conv _2 represents a feature spectrum for classifying models, conv _3 represents a feature spectrum for classifying devices, Att represents correlation features and camera attribute information, Label₁A brand Label, Label, representing the actual classifier classification₂Indicating model Label, Label₃Representing an individual tag;

t3, constructing a cost function for the obtained category label:

Cost_b＝Classify[(Label₁),Label&B_label]，

Cost_m＝Classify[(Label₂),Label&M_label]，

Cost_d＝Classify[(Label₃),Label&D_label]，

Cost＝α×Cost_b+β×Cost_m+δ×Cost_d，

wherein: cost _ b, Cost _ m and Cost _ d respectively represent Cost functions of optimizing brands, models and individuals under supervision of actual labels, Cost represents an overall Cost function obtained by setting different hyper-parameters, and Label represents an actual Label.

According to the scheme, the original features of the camera are extracted, the content features (including low-frequency and high-frequency content information) of the image are extracted through multilayer convolution, and the content features are adaptively removed from the original features to obtain attribute information convenient to classify. And (4) progressively regressing the categories by adopting a multi-task training mode, and classifying the brand, the model and the equipment of the camera by using a single model. The coupled coding method is adopted to ensure that the sub-classifier can act against the parent classifier, so that the accuracy of the parent classifier is improved. By properly increasing the redundancy of the encoding method, the classified model is used as a pre-training model for classifying the new camera class, so that the training time can be greatly reduced. It was found in experiments that example 1 can generate ill-conditioned problem to counteract the coupling of the code

Example 2

As shown in fig. 1, image content is extracted by using a multi-layered convolution kernel using a residual learning method. The depth convolution kernel can learn more high frequency details, and deeper networks can remove image content better, so that the residual network will classify the camera categories more easily. When only one convolution kernel extracts image features, only the low frequency content of the image can be extracted.

Image features are extracted by using one convolution layer and then the content of the image is extracted using multi-layer convolution. The outputs of the multiple layers of convolution kernels are connected, and content having the same dimension as the image feature is extracted by the 1 × 1 convolution kernel. To eliminate more high frequency information, the number of signature channels of the last few convolutional layers is increased. The CNN may determine the content that needs to be deleted in order to maximize the retention of information related to the classification attributes.

The residual learning network destroys the correlation between the original camera image neighborhoods. Therefore, additional convolutional layers are used to extract the relevant information of the camera image neighborhood, and then the two pieces of information are connected together as the final output features.

The invention adopts a multitask training mode to classify the brand, the model and the equipment of the camera respectively. Different from the previous classification method, the invention provides a new classification method in order to improve the classification accuracy and increase the coupling between the three classification modes of the camera. The output categories of the existing method for classifying the brand, the model and the equipment are 14, 27 and 74 respectively, namely, each brand is fitted into one category. This approach does not take into account the correlation between the three categories of a single camera model. The multi-classification method can reduce the fitting performance of the network to a certain extent, the more the output classes are, the more the influence is obvious, and the performance of the two classes is better than that of the multi-classification. In order to improve the classification performance of the models and the equipment, the invention ensures that the models and the equipment are classified respectively under the same brand and model, so that the unconstrained multi-classification is converted into the constrained two-classification or three-classification. Wherein the coding length of the coding method is

N＝b+max(m[i])+max(d[i])

i is 0, …, b-1, j is 0, … m-1, N is the code length, b, m, d are the brand, model, number of devices, respectively. The Dresden database was used to obtain maximum b, m, d of 14, 5 respectively. The total length of the code can be taken to be 24.

The invention selects a part of equipment of the Sony _ DSC camera as an example. Because all camera devices originate from the same brand, there is no classification of brands involved. The classifier takes the form of a fully connected output class. For example, the output class is encoded using a six-bit binary. For example, the Sony _ DSC-H50_0 camera brand (b ═ 1) classifier outputs an ideal result of 100000, the model (m ═ 2) classifier outputs an ideal result of 110000, and the camera device (d ═ 3) classifier outputs an ideal result of 110100, i.e., classification is performed in a progressive manner. The present invention states that when the class of the device or model is less than the binary number (the number of devices (2) of Sony _ DSC-W170_0 is less than d), the encoding redundancy is, for example, no encoding is 101001 device. Of course, coding redundancy can reduce the performance of the network. For the coupled coding form to be effective, it is necessary to satisfy that the coding redundancy has negligible effect on performance, and at the same time, the coding bits between different classifiers do not affect each other (i.e., when classifying devices, binary bits of brand and model do not produce set bits).

The existing method can accurately realize the brand classification of the camera, and the identification of the camera equipment still has great difficulty. Convolutional neural networks require a large amount of data for training and are limited by the requirements of hardware devices, and block processing is often adopted in methods based on convolutional neural networks. The pictures are first divided into small blocks and then trained using a batch process. Although the performance of the network is reduced to a certain extent, the method has the advantages of higher operation speed and more data. When the data is excessive, a lot of time is required for training. For models that partially contain a single device, they may be divided based on brand or model. Therefore, training this portion of data can greatly increase the average accuracy of the network for device classification when testing the performance of the network for distinguishing devices. However, this approach actually only tests the classification performance of the network for model.

TABLE 1 partitioning of the numerical space

	Brand	Model number	Device
				FC(conv₁)	[a,b]	0	0
Soft max	1	0	0
				FC(conv₁)	[a,b]	10^k×[a,b]	0
Soft max	0	1	0
				FC(conv₁)	[a,b]	10^k×[a,b]	10ⁿ×[a,b]
Soft max	0	0	1

As mentioned above, when using Dresden database, the code length of the brand, model, device is 14, 5, respectively. When a multitask mode is adopted for training, progressive training needs to be carried out by adopting the coding mode, namely, the model can be trained well only when accurate brand classification is guaranteed. And meanwhile, the equipment can be better classified on the premise of ensuring the accurate brand and model of the camera. The invention is an integration of a brand classifier (1-bit position 1), a model classifier (2-bit position 1) and an equipment classifier (3-bit position 1). The brand of the camera occupies most of the binary coded bits, which also results in the lowest accuracy of the camera brand at initialization. Meanwhile, the conditions of model misclassification and correct equipment classification can occur during classification. Therefore, there is a need to prevent the classification of cameras in multitask training from falling into local minima, while ensuring that the training is done in a recursive manner. As shown in table 1, a ill-conditioned problem may also arise, dividing the three different categories into different numerical spaces. The existing method only constructs a model to classify the existing database, and when a new database needs to be classified, the model needs to be retrained, so that the training time of the network is greatly increased. By adding redundant coding, a pre-trained model can be used to train new data, which can greatly reduce training time.

The existing method can well distinguish most models of the camera, but the classification performance of partial models, such as D70 models and D70s models, is poor. Therefore, the invention proposes an auxiliary classifier, which helps to improve the classification performance of the main classifier. However, the classification method needs to separately construct a classifier for each camera model, and the requirement on the memory is high, so that the invention only uses the classifier as an auxiliary classifier to reclassify the camera models which are difficult to classify by the main classifier.

As shown in fig. 2, the present invention chooses to classify the same positions of different pictures into the same class, i.e. based on the imaging position classification. Meanwhile, in order to improve the accuracy of position classification, the classification of the same picture is not too many. In the experiment, to ensure that the input is the same as the input of the master classifier, the invention selects a batch size of 64(patch _ size). Considering that the sizes of different camera type pictures are different, the invention only selects the upper left corner of the pictures for classification. Selecting the area as

input＝img[patch_size×C_line,patch_size×C_col]

Wherein, C _ line and C _ col are 20, and the total classification is 400. The classification performance of the location classifier (PC) is poor because of the large coupling between neighboring pixels and the limitation of the data amount. The coupling between adjacent classes can be reduced and the size of patch _ size can be increased by appropriately setting the step value when dividing the batch size. Although the classification performance of the position classifier is poor, part of useful position information can be extracted to divide the binary classifier.

For any camera class, the present invention first trains the location classifier separately. With the gradual improvement of the accuracy of the position classifier for classifying the position, the network can better extract the attribute information of the camera. It is noted that the neighborhood classification method can extract attribute information of any local position of the camera. However, when the same camera picture is subjected to the same post-processing method (different processing methods between different camera models), the location-based classifier may ignore this information. Therefore, on the basis of the multi-position classifier, the additional CNN network is added for extracting the global information with larger difference of different camera models. And when the classification of the position classifier is finished, fixing network parameters, and training the binary classifier by using all the extracted features.

The invention adopts the following method to solve the problems:

the ill-conditioned problem is solved, namely the influence of the division of the numerical value interval on the coupling is eliminated.

U1, extracting each classification category label and modifying to:

B_label＝[1,…,1,0,…0,0,…,0]，

M_label＝[1,…,1,1,…1,0,…,0]，

D_label＝[1,…,1,1,…1,1,…,1]。

u2, generating category labels using the fully connected layers by extracting the obtained features:

conv_1,conv_2,conv_3＝Conv(Att)

wherein Att represents the extracted attribute features and the correlation features;

and then processing the extracted features to obtain corresponding class labels.

Label₁＝Softmax[FC(Conv_1)]

Label₂＝2×Softmax[FC(Conv_2)]-Label₁

Label₃＝3×Softmax[FC(Conv_3)]-Label₂-Label₁

U3, constructing a cost function for eliminating the ill condition for the obtained class label:

Classify2(logits,label)＝-Sum(label×log(logits))

Cost_b＝Classify2[(Label₁),Label&B_label]

Cost_m＝Classify2[(Label₂),Label&M_label]

Cost_d＝Classify2[(Label₃),Label&D_label]

the recursive training and the coupling problem are solved, namely the classification performance of the subclass classifier can be reversely improved to the classification accuracy of the parent class.

U4 construction of a coupling cost function

Cost_1＝L₁(Label₂[:b],Label₁[:b])

Cost_2＝L₁(Label₃[:b+m],Label₂[:b+m])

The overall cost penalty is shown in figure 3.

Cost＝α×Cost_b+β×Cost_m+χ×Cost_d+δ×Cost_1+ε×Cost_2

Wherein α + β + χ is 1, δ is 1, and ∈ is 1.

The Cost _1 and Cost _2 are used for ensuring that the parent classifier and the subclass classifier have the same classification capability on the parent class. That is, the brand classification performance of the model classifier is the same as that of the brand classifier. When both are the same but the classification is wrong, the loss (the sum of the brand misclassification loss and the model misclassification loss) increases dramatically.

Solving the problem of retraining the model for a new data set:

u5, constructing a reasonable code length L (L > N, N is the total code length needed by the database).

As shown in table 2, when the brand (14 type), the model (27 type), and the device (74 type) are classified, the minimum number of bits selected for each class is 14, 5, and 5, respectively. Therefore, the maximum code length can be used for coding brands (14 types), models (70 types) and devices (350 types). Meanwhile, the redundancy (type 6) can be added to the brand appropriately, and at the moment, the brand (type 20), the model (type 100) and the equipment (type 500) can be coded. When classifying the new camera category, setting unused codes for the new camera category, and adopting the trained model as a pre-training model for training again, thereby reducing the training time.

TABLE 2 redundancy coding

The problem of poor classification performance of partial categories is solved:

u6, constructing a position classifier to extract the local position information of the picture, and classifying the picture by using the local position information as the feature.

As shown in fig. 2, each position of the camera picture is classified into 1 type, and a position classifier is first trained. The higher the classification accuracy of the position classifier is, the richer the extracted position information is. However, not all of the extracted location information is beneficial for binary classifier classification. Therefore, the invention adopts an alternate training mode, namely alternately training the position classifier and the binary classifier every 5 periods.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be able to cover the technical solutions and the inventive concepts of the present invention within the technical scope of the present invention.

Claims

1. An image source identification method based on adaptive filtering and coupling coding is characterized by comprising the following steps:

s4, classifying the brand, model and equipment of the camera by adopting a multi-task classifier according to the extracted correlation characteristics and the camera attribute characteristics;

the coupling type coding adopted by the multi-task classifier is as follows:

B_label＝[1,…,1,0,…0,0,…,0]，

M_label＝[1,…,1,1,…1,0,…,0]，

D_label＝[1,…,1,1,…1,1,…,1]，

conv_1,conv_2,conv_3＝Conv(Att)，

Label₁＝FC(Conv_1)，

Label₂＝FC(Conv_2)-Label₁，

Label₃＝FC(Conv_3)-Label₂-Label₁，

wherein: conv _1 represents a feature spectrum of a neural network for classifying brands, conv _2 represents a feature spectrum for classifying brands and models, conv _3 represents a feature spectrum for classifying brands, models and devices, Att represents correlation characteristics and camera attribute information, Label₁A brand Label, Label, representing the actual classifier classification₂Indicating model Label, Label₃Representing an individual tag;

t3, constructing a cost function for the obtained category label:

Classify2(logits,label)＝-Sum(label×log(logits))

Cost_b＝Classify2[(Label₁),Label&B_label]

Cost_m＝Classify2[(Label₂),Label&M_label]

Cost_d＝Classify2[(Label₃),Label&D_label]

wherein: cost _ b, Cost _ m and Cost _ d respectively represent Cost functions of optimizing brands, models and individuals under the supervision of actual labels, Label represents the actual labels, and logits represents logic values;

t4 construction of a coupling cost function

Cost_1＝L₁(Label₂[:b],Label₁[:b])；

Cost_2＝L₁(Label₃[:b+m],Label₂[:b+m])；

Cost＝α×Cost_b+β×Cost_m+χ×Cost_d+δ×Cost_1+ε×Cost_2