CN113378859B

CN113378859B - Image privacy detection method with interpretability

Info

Publication number: CN113378859B
Application number: CN202110723826.0A
Authority: CN
Inventors: 张兰; 于海阔; 李向阳
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2021-06-29
Filing date: 2021-06-29
Publication date: 2022-07-15
Anticipated expiration: 2041-06-29
Also published as: CN113378859A

Abstract

The invention discloses an interpretable image privacy detection method, which comprises the following steps: 1. constructing a data set of the privacy classification; 2. extracting semantic information contained in the data set image; 3, constructing a weighted directed graph corresponding to the data set image; 4. constructing a graph neural network and training the output of the second classification of the nodes of the weighted directed graph based on the dataset; 5. determining a coding pixel area of the input image by using the neural network obtained by training; 6. and generating the privacy rule. The invention provides automatic positioning of image desensitization based on a deep learning theory, and can explain a privacy classification result at an object level to obtain a privacy rule at the object level, thereby improving interpretability of privacy classification.

Description

Interpretable image privacy detection method

Technical Field

The invention relates to the field of data privacy and machine learning, in particular to an image privacy detection method with interpretability.

Background

With the development of the information age and the popularization of smart phones, people are used to share and communicate on a social platform, and according to incomplete statistics, Facebook generates 3.5 hundred million photos and 1 hundred million hours of video browsing every day. Instagram users shared 9500 million photos and videos per day. Meanwhile, the personal picture is also perceived and collected by the outside world in many other different ways, such as mobile phone application access, cloud storage, and the like. However, while sharing and spreading of social pictures brings great convenience to people, the risk of privacy disclosure is also increased by the large amount of rich personal information contained in the published pictures, for example, the background in the pictures exposes the geographic location and whereabouts, the people appearing in the pictures expose the social relationships of privacy, and the like.

When people release photos, the protection of their privacy is often ignored, and even if people notice that the images are desensitized by a manual coding mode in most cases, the mode is complicated and not safe enough. When privacy disclosure events occur continuously, people need to research privacy information in pictures, so that the users can be helped to judge which personal pictures are private and which are not private, research privacy rules with consensus and personalization is carried out, and end-to-end help is provided for desensitization of the users.

In the existing privacy detection technology, some works are to divide the picture into private and non-private pictures by means of deep learning or support vector machines and the like directly from the view point of picture classification, and the classification result of the technology depends on a training set and has no interpretability, so that the technology has no available value. Some technologies research objects contained in pictures, research the correlation between the objects and the privacy of the pictures, and thereby evaluate the privacy degree of the pictures, and such methods have the problems of unclear privacy definition, limited detection capability in model output, poor interpretability of privacy rules, and the like. And the existing data set generally has the problems of poor interpretability of privacy classification and the like.

Disclosure of Invention

Aiming at the problems of the existing method, the invention provides an image privacy detection method with interpretability, so that the automatic positioning of image desensitization can be provided based on a deep learning theory, and the privacy classification result can be interpreted at an object level to obtain the privacy rule at the object level, thereby improving the interpretability of the privacy classification.

In order to achieve the purpose, the invention adopts the following technical scheme:

the invention relates to an interpretable image privacy detection method which is characterized by comprising the following steps:

step 1, constructing a data set of privacy classification;

step 1.1, collecting N code printing images subjected to desensitization processing on a social platform by using a web crawler;

step 1.2, marking frame information and categories of all objects and backgrounds in each code printing image, so as to obtain the number K of the object categories of N code printing images;

step 1.3, marking the object and the background in each coded image as 1 if desensitization treatment is carried out on the object and the background, otherwise marking the object and the background as 0, thereby completing privacy classification of the object in each coded image and obtaining a desensitization image data set D;

step 2, extracting semantic information contained in the image;

step 2.1, training the fast-rcnn model based on the desensitization image data set D to obtain a target detection model M₁；

Any input image I passes through the target detection model M₁Post-output detection result M₁(I)＝{o₁,o₂,…,o_i,…,o_nIn which o is_iIndicates that the frame information B is included_iAnd the ith object information of the classified confidence probability, n represents the target detection model M₁The number of objects detected in the input image I, I ∈ [1, n ]]；

Step 2.2, training the vgg model based on the place365 data set to obtain a background classification model M₂；

Any input image I passes through a background classification model M₂Post-output background classification result M₂(I)＝{s₁,f₁In which s₁Confidence probability representing the classification of the background, f₁A feature vector representing an output of the background classification model;

step 3, constructing a weighted directed graph G corresponding to the input image I_I＝{V_I,E_I}；

Step 3.1 construction of node set V_I：

For the ith object information o_iConstructing and using corresponding embedded vectors as corresponding nodes

Thereby obtaining n nodes { v }₁,v₂,…,v_n}; wherein,

indicates the ith object information o_iA corresponding confidence probability vector of length K,

indicates the ith object information o_iThe confidence probability for the kth class,

indicates the ith object information o_iThe information of the corresponding frame is stored in the storage device,

representing the coordinates of the center point of the frame in the pixel matrix of the input image I,

representing the relative size of the bounding box;

confidence probability s using a multi-tier perceptron as shown in equation (1)₁And feature vector f₁Mapping to a background node v of length K +3_n+1Thereby obtaining a node set V_I＝{v₁,v₂,…,v_n+1}；

v_n+1＝(s₁||f₁)×W_n+1 (1)

In the formula (1), W_n+1A parameter matrix representing a multilayer perceptron, · | |, representing splicing of vectors;

step 3.2 construct adjacency matrix E_I；

Calculating any ith node v_iFrame box of_iAnd the jth node v_jBox of (2)_jWhether or not to overlap, if so, calculating the overlapped pixel area S (box) by using the formula (2)_i∩box_j) Occupy the frame box_iRatio of areas E_I[i,j]If there is no overlap, let the ratio E_I[i,j]0; thereby obtaining an adjacency matrix E_I＝{E_I[i,j]|i＝1,2,…,n；j＝1,2,…,n}；

In the formula (2), S represents a pixel area;

step 4, constructing a general formula₁Layer diagramConvolutional layers and₂graph neural network M composed of layer perceptron₃And trained on the basis of the desensitized image dataset D to utilize a neural network M₃Get weighted directed graph G_I＝{V_I,E_ITwo classification outputs for each node in the tree;

step 4.1 construct a graph convolution layer using equation (3):

h_m+1＝σ(E_Ih_mw_m),0≤m≤l₁-1 (3)

in the formula (3), l₁Indicates the number of convolutional layers, h_mFeature vector, w, representing the convolution layer of the m-th layer map_mA parameter matrix representing the mth layer map convolution layer; σ represents an activation function; when m is equal to 0, the compound has the following structure,

step 4.2 construction of l Using formula (4)₂The layer perceptron enables each node to share one parameter matrix, and therefore the output O of the multilayer perceptron shown in the formula (5) is obtained;

in the formulae (4) and (5),

a feature vector representing the k-th layer perceptron,

a parameter matrix representing the k-th layer perceptron,

the eigenvectors representing the k +1 layer perceptron,

denotes the l₂Feature vectors of layer perceptrons,/₂Representing the number of layers of the multilayer perceptron, and sig () representing a sigmoid function for outputting a binary result of each node;

step 4.3, constructing a loss function L (theta) by using the formula (6):

in formula (6), O (v)_i) Represents the ith node v_iThe classification result of (2); theta denotes the neural network M of the graph₃And is given by

y_iRepresents the ith node v_iThe privacy classification label value of (1);

step 4.4, training the loss function L (theta) based on the desensitization image data set D by using a gradient descent method until the model converges, thereby obtaining a trained graph neural network M₃And outputs a weighted directed graph G_I＝{V_I,E_IThe result of the classification of each node in the Chinese character- };

step 5, using the trained graph neural network M₃Determining a coding pixel area of an input image I;

step 5.1, if the ith node v_iClassification result of (v) O (v)_i) If the value is larger than the set threshold value, the node v represents the ith node v_iIs a privacy node, otherwise, represents the ith node v_iIs a non-privacy node;

step 5.2, assume the ith node v_iFor the privacy node, if i belongs to [1, n ]]Then utilize the ith node v_iCorresponding frame information

Determining a coding area; if i is n +1, it means the ith node v_iMarking the coding region as a background node, and removing the coding region from the whole input image I{B₁,B₂,…,B_nPixel areas left behind the frame of the contained object;

step 6, generating a privacy rule:

step 6.1, constructing a weighted directed graph G by using the formula (7)_I＝{V_I,E_IIth node v in_iSymbol vector sv of_i：

In formula (7), sv [ j ]]Represents the ith node v_iSign of the jth value of (v)_i[j]Represents the ith node v_iThe jth value of (d);

step 6.2, solving the classification result by using the formula (8) for the ith node v_iGradient vector gv of node_i：

In formula (8), gv [ j ]]Represents the ith node v_iThe gradient of the jth value of (a);

step 6.3, solving the ith node v by using the formula (9)_iImportance Im (v) to classification results_i)：

Step 6.4, if Im (v)_i) If the value is larger than the set privacy threshold value tau, the node v indicates the ith node v_iIs a privacy-related node;

step 6.5, constructing a weighted directed graph G by using the formula (10)_I＝{V_I,E_ISign matrix of the middle edges se:

formula (10)) In se [ i, j ]]Represents the ith node v_iAnd the jth node v_jThe sign of the edge of (a);

step 6.6, solving a gradient matrix ge of the classification result for each edge by using the formula (11):

in formula (11), ge [ i, j ]]Represents the ith node v_iAnd the jth node v_jThe gradient of the edge of (a);

step 6.7, solve the adjacency matrix Ime (E) of privacy-related edges between privacy nodes using equation (12)_I)：

Ime(E_I)＝se·ge (12)

Step 6.8, making the privacy sub-graph SG_I＝{{v_i|Im(v_i)≥τ，1≤i≤n},Ime(E_I) The privacy rule of the input image I is used as the input image I;

and 7, taking the coded pixel area and the privacy rule as the privacy detection result of the input image I.

Compared with the prior art, the invention has the beneficial effects that:

1. according to the method, the privacy degrees of the objects are classified by constructing the graph neural network, the number of parameters of the traditional convolutional neural network is greatly reduced, and simultaneously the classes of the objects in the images and the contact information between the objects are effectively combined, so that the efficiency and the accuracy of image-level privacy classification and object-level privacy classification are effectively improved.

2. The invention provides a method for evaluating the privacy relevance of an object in an image by utilizing the interpretability of a neural network; by extracting gradient information of the input vector, the object information and the object privacy classification result are associated, and the reason of object privacy classification in the image is explained at a semantic level, so that the interpretability of the image privacy protection system is effectively improved.

3. The invention has high expandability, can be used in combination with a plurality of image information extraction models, for example, the system is easily expanded to different object detection models and corresponding heat map generation methods, and can effectively position the pixel position of an object needing desensitization in an image, thereby improving the usability of the image privacy protection system.

4. The invention designs a method for the real information data set containing the image privacy rule for the first time, breaks through the limitation that the traditional method can only use a public data set to simulate the privacy data set, and provides a new scheme and a data collection method for further and deeply researching the image privacy in the future.

Drawings

FIG. 1 is a flow chart of privacy rule generation in the present invention;

FIG. 2 is an example of a data set image of the present invention;

FIG. 3a is an input image of a test set according to the present invention;

FIG. 3b is a node classification diagram corresponding to the input images in the test set according to the present invention.

Detailed Description

In this embodiment, as shown in fig. 1, an interpretable image privacy detection method is performed as follows:

step 1, constructing a data set of privacy classification;

step 1.1, collecting N code printing images subjected to desensitization processing on a social platform by using a web crawler; in this embodiment, the main mobile phone platforms include multiple social platforms such as microblogs, knows, small red books, and the like, and the final coding image set is obtained by using an image modification detection program and manual screening. FIG. 2 is an exemplary image of a data set according to the present invention.

Step 1.2, marking frame information and categories of all objects and backgrounds in each code printing image, so as to obtain the number K of the object categories of the N code printing images; in this embodiment, 101 objects and scenes such as a human face, various cards, a person, an automobile, a bag, a mobile phone, a medicine, a notebook, an indoor space, an outdoor space, and the like are mainly marked.

Step 1.3, marking the object and the background in each code printing image as 1 if the object and the background are subjected to desensitization treatment, and marking the object and the background as 0 if the object and the background are not subjected to desensitization treatment, so that the privacy classification of the object in each code printing image is completed, and a desensitization image data set D is obtained;

step 2, extracting semantic information contained in the image;

step 2.1, training the faster-rcnn model based on the desensitization image data set D to obtain a target detection model M₁；

Any input image I passes through the target detection model M₁Post-output detection result M₁(I)＝{o₁,o₂,…,o_i,…,o_nIn which o_iIndicates that the frame information B is included_iAnd the ith object information of the confidence probability of classification, n represents the target detection model M₁The number of objects detected in the input image I, I ∈ [1, n ]]；

Step 2.2, training the vgg model based on the place365 data set to obtain a background classification model M₂(ii) a The data set is used as a data set of a scene classification model, and application scenes matched with the method are compared.

Any input image I passes through a background classification model M₂Post-output background classification result M₂(I)＝{s₁,f₁In which s is₁Confidence probability representing background classification, f₁A feature vector representing an output of the background classification model;

Step 3.1 construction of node set V_I：

For the ith object information o_iConstructing corresponding embedded vectors and using the embedded vectors as corresponding nodes

Thereby obtaining n nodes { v₁,v₂,…,v_n}; wherein,

indicates the ith object information o_iFor the confidence probability of the kth class,

indicates the ith object information o_iThe information of the corresponding frame is displayed on the display,

representing the relative size of the bounding box;

confidence probability s is determined using a multi-tier perceptron as shown in equation (1)₁And a feature vector f₁Mapping to a background node v of length K +3_n+1Thereby obtaining a node set V_I＝{v₁,v₂,…,v_n+1}；

v_n+1＝(s₁||f₁)×W_n+1 (1)

In the formula (1), W_n+1A parameter matrix representing a multi-layer perceptron, | | | represents the splicing of vectors;

step 3.2 construct adjacency matrix E_I；

Calculating any ith node v_iFrame box of_iAnd the jth node v_jBox of (2)_jWhether or not to overlap, and if so, calculating the overlapped pixel area S (box) by using the formula (2)_i∩box_j) Occupy the frame box_iRatio of areas E_I[i,j]If there is no overlap, let the ratio E_I[i,j]0; thereby obtaining an adjacency matrix E_I＝{E_I[i,j]1,2, …, n; j ═ 1,2, …, n }. Representing v using a size-proportional relationship of areas_iAnd v_jAnd further describes the closeness of the relationship between the objects.

In the formula (2), S represents a pixel area;

step 4, constructing a general formula₁Layer map convolution layer and₂graph neural network M composed of layer perceptron₃In this embodiment, the method sequentially includes: the system comprises a No. 1 graph volume layer, a No. 1 full connection layer, a No. 2 graph volume layer, a No. 2 full connection layer, a shared 2-layer sensing machine and an activation function layer; and trained on the basis of the desensitized image dataset D, thereby utilizing the neural network M of the map₃Get a weighted directed graph G_I＝{V_I,E_ITwo classification outputs for each node in the tree;

step 4.1 construct a graph convolution layer using equation (3):

h_m+1＝σ(E_Ih_mw_m),0≤m≤l₁-1 (3)

in the formula (3), l₁Indicates the number of convolutional layers, h_mFeature vector, w, representing the convolution layer of the mth layer map_mA parameter matrix representing the convolution layer of the m-th layer map; σ represents an activation function; when m is equal to 0, the compound is,

in the formulae (4) and (5),

a feature vector representing the k-th layer perceptron,

a parameter matrix representing the k-th layer perceptron,

the eigenvectors representing the k +1 layer perceptron,

denotes the l (th)₂Feature vectors of layer perceptron,/₂Representing the number of layers of the multi-layer perceptron and sigma () representing the activation function, in this example the relu function and the leakyrelu function are used. sig () represents a sigmoid function for outputting a result of the classification by two for each node;

step 4.3, constructing a loss function L (theta) by using the formula (6):

in formula (6), O (v)_i) Represents the ith node v_iThe classification result of (2); theta denotes graph neural network M₃Is a parameter of

y_iRepresents the ith node v_iThe privacy classification label value of (1);

step 4.4, training the loss function L (theta) by using a gradient descent method based on the desensitization image data set D until the model converges, thereby obtaining a trained graph neural network M₃And outputs a weighted directed graph G_I＝{V_I,E_IThe result of the classification of each node in the Chinese character- }; fig. 3a and 3b show images in the test set of the data set, and as can be seen from the node classification results in the figures, the coded object in fig. 3a and the node privacy classification result in fig. 3b are basically consistent.

step 5.1, if the ith node v_iClassification result of (d) O (v)_i) Is greater than the set threshold value and is,then it represents the ith node v_iIs a privacy node, otherwise, represents the ith node v_iIs a non-privacy node; in this embodiment, the threshold is set to 0.

Determining a coding area; if i is n +1, it means the ith node v_iMarking the coding region as the whole input image I except for the { B ] as the background node₁,B₂,…,B_nPixel area left behind the border of the contained object;

and 6, generating a privacy rule, wherein the reason of the classification result needs to be explained for the privacy classification result output by the model, so that the classification result can be trusted. Whether a certain object is coded or not is related to the object itself, and may also be related to the background in the image and other objects appearing in the image. The privacy rule of the image I coding object is G_IBy solving the privacy rules contained in the sub-graph representation image:

step 6.2, solving the classification result for the ith node v by using the formula (8)_iGradient vector gv of node_i：

Step 6.4, if Im (v)_i) If the value is larger than the set privacy threshold value tau, the node v indicates the ith node v_iFor privacy-related nodes, in this embodiment, the threshold τ is set to 0.5.

Step 6.5, constructing a weighted directed graph G by using the formula (10)_I＝{V_I,E_ISign matrix se of middle edge:

in formula (10), se [ i, j]Represents the ith node v_iAnd the jth node v_jThe sign of the edge of (a);

in formula (11), ge [ i, j]Represents the ith node v_iAnd the jth node v_jThe gradient of the edge of (a);

Ime(E_I)＝se·ge (12)

and 7, the coding pixel area and the privacy rule are jointly used as the privacy detection result of the input image I, in the traditional work, the privacy judgment condition is often a black box model, and reasonable explanation cannot be given, but the method uses the privacy subgraph to represent the privacy rule, so that the relation between objects and the influence on the privacy rule can be seen semantically, the defect that the traditional method cannot be explained is overcome, and the output coding area at the pixel level can provide accurate positioning for subsequent desensitization operation.

Due to the machine learning interpretability technology, the method has the advantages of being capable of interpreting privacy reasons at a semantic level and positioning privacy, and is suitable for scenes such as cloud uploading, social network sharing and mobile phone application reading.

Claims

1. An interpretable image privacy detection method is characterized by comprising the following steps:

step 1, constructing a data set of privacy classification;

step 1.1, collecting N coded images subjected to desensitization treatment on a social platform by using a web crawler;

step 2, extracting semantic information contained in the image;

Any input image I passes through the target detection model M₁Post-output detection result M₁(I)＝{o₁,o₂,…,o_i,…,o_nIn which o is_iThe representation includes frame information B_iAnd the ith object information of the classified confidence probability, n represents the target detection model M₁Detected on the input image INumber of objects, i ∈ [1, n ]]；

Step 3.1 construct node set V_I：

Thereby obtaining n nodes { v₁,v₂,…,v_n}; wherein,

indicating a relatively large borderSmall;

confidence probability s is determined using a multi-tier perceptron as shown in equation (1)₁And feature vector f₁Mapping to a background node v of length K +3_n+1Thereby obtaining a node set V_I＝{v₁,v₂,…,v_n+1}；

v_n+1＝(s₁||f₁)×W_n+1 (1)

step 3.2 construct adjacency matrix E_I；

Calculating any ith node v_iFrame box of_iAnd the jth node v_jBox of (2)_jWhether or not to overlap, and if so, calculating the overlapped pixel area S (box) by using the formula (2)_i∩box_j) Occupy the frame box_iRatio of areas E_I[i,j]If there is no overlap, let the ratio E_I[i,j]0; thereby obtaining an adjacency matrix E_I＝{E_I[i,j]|i＝1,2,…,n；j＝1,2,…,n}；

In formula (2), S represents a pixel area;

step 4, constructing a group I₁Layer map convolution layer sum₂Graph neural network M composed of layer perceptron₃And trained on the basis of the desensitized image dataset D, thereby utilizing the neural network M of the map₃Get a weighted directed graph G_I＝{V_I,E_ITwo classification outputs of each node in the tree;

step 4.1 construct a graph convolution layer using equation (3):

h_m+1＝σ(E_Ih_mw_m),0≤m≤l₁-1 (3)

in the formula (3), l₁Indicates the number of convolutional layers, h_mFeature vector, w, representing the convolution layer of the mth layer map_mRepresents the m-th layerA parameter matrix of the graph convolution layer; σ represents an activation function; when m is equal to 0, the compound is,

step 4.2 construction of l Using formula (4)₂The layer perceptron enables each node to share one parameter matrix, and therefore output O of the multilayer perceptron shown in the formula (5) is obtained;

in the formulae (4) and (5),

a feature vector representing the k-th layer perceptron,

a parameter matrix representing the k-th layer perceptron,

representing the feature vector of the k +1 th layer perceptron,

denotes the l (th)₂Feature vectors of layer perceptron,/₂Representing the number of layers of the multilayer perceptron, and sig () representing a sigmoid function and used for outputting a binary classification result of each node;

step 4.3, constructing a loss function L (theta) by using the formula (6):

in the formula (6), O (v)_i) Represents the ith node v_iThe classification result of (2); theta denotes the neural network M of the graph₃And is given by

y_iRepresents the ith node v_iThe privacy classification label value of (1);

step 4.4, training the loss function L (theta) based on the desensitization image data set D by using a gradient descent method until the model converges, thereby obtaining a trained graph neural network M₃And outputs a weighted directed graph G_I＝{V_I,E_IThe result of classification of each node in the tree is obtained;

step 5.2, assume ith node v_iFor the privacy node, if i belongs to [1, n ]]Then utilize the ith node v_iCorresponding frame information

Determining a coding area; if i is n +1, it means the ith node v_iMarking the coding region as the whole input image I except for the { B ] as the background node₁,B₂,…,B_nPixel areas left behind the frame of the contained object;

step 6, generating a privacy rule:

step 6.1, constructing a weighted directed graph G by using the formula (7)_I＝{V_I,E_IThe ith node v in_iSymbol vector sv of_i：

Step 6.4, if Im (v)_i) If the value is larger than the set privacy threshold value tau, the node v represents the ith node v_iIs a privacy-related node;

Ime(E_I)＝se·ge (12)