CN114511518A

CN114511518A - Method and device for evaluating visual security of image, electronic equipment and storage medium

Info

Publication number: CN114511518A
Application number: CN202210065028.8A
Authority: CN
Inventors: 向涛; 肖宏飞; 杨莹; 郭尚伟; 李洪伟; 李浥东
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2022-01-20
Filing date: 2022-01-20
Publication date: 2022-05-17

Abstract

The application relates to the technical field of image visual security assessment, and discloses a method for assessing image visual security, which comprises the following steps: acquiring a perception encrypted image; preprocessing a perception encrypted image to obtain a plurality of image blocks; extracting perception features corresponding to the image blocks respectively by using a preset feature extraction network; inputting each perception characteristic into a preset first full-connection network to obtain alternative scores corresponding to each image block, and inputting each perception characteristic into a preset second full-connection network to obtain region level significance corresponding to each image block; the region level significance is used for representing the influence weight of the image block on the visual security evaluation of the perception encrypted image; and acquiring a visual security score corresponding to the perception encrypted image according to the region level significance and the alternative scores. Therefore, the visual safety evaluation of the perception encrypted image can be more conveniently carried out. The application also discloses a device for evaluating the visual security of the image, electronic equipment and a storage medium.

Description

Method and device for evaluating visual security of image, electronic equipment and storage medium

Technical Field

The present application relates to the field of image visual security assessment technology, and for example, to a method and an apparatus for assessing image visual security, an electronic device, and a storage medium.

Background

In recent years, image vision security assessment has received high attention from researchers at home and abroad. And image visual security assessment mainly includes two types: one is to directly utilize the image quality evaluation index to evaluate the visual security of the image, and the other is the visual security evaluation index designed aiming at the encrypted image data; because perception encryption aims at achieving the purpose of protecting the security of image content, the quality of an encrypted image is often low, and therefore image visual security evaluation mainly aims at low-quality images; meanwhile, the distortion effect caused by encryption is greatly different from the distortion such as noise and the like caused in other common signal processing, so that the visual security evaluation of the encrypted image by adopting an image quality evaluation method has many problems.

In the process of implementing the embodiments of the present disclosure, it is found that at least the following problems exist in the related art:

in the prior art, when the visual security evaluation is performed on the perception encrypted image, image features in the perception encrypted image need to be manually extracted, and meanwhile, a plaintext image of the perception encrypted image needs to be compared and referred, so that the user is inconvenient when the visual security evaluation is performed on the perception encrypted image.

Disclosure of Invention

The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosed embodiments. This summary is not an extensive overview nor is intended to identify key/critical elements or to delineate the scope of such embodiments but rather as a prelude to the more detailed description that is presented later.

The embodiment of the disclosure provides a method and a device for evaluating image visual security, electronic equipment and a storage medium, so that visual security evaluation of perception encrypted images can be performed more conveniently.

In some embodiments, the method for assessing visual security of an image comprises: acquiring a perception encrypted image; preprocessing the perception encrypted image to obtain a plurality of image blocks; extracting perception features corresponding to the image blocks respectively by using a preset feature extraction network; inputting each sensing feature into a preset first fully-connected network to obtain alternative scores corresponding to each image block, and inputting each sensing feature into a preset second fully-connected network to obtain area level significance corresponding to each image block; the region level significance is used for representing the influence weight of the image block on the visual security evaluation of the perception encrypted image; and acquiring a visual security score corresponding to the perception encrypted image according to each region level significance and each alternative score.

In some embodiments, the apparatus for evaluating visual security of an image comprises: a first acquisition module configured to acquire a perceptually encrypted image; the preprocessing module is configured to preprocess the perception encrypted image to obtain a plurality of image blocks; the feature extraction module is configured to extract perceptual features respectively corresponding to the image blocks by using a preset feature extraction network; the second acquisition module is configured to input each sensing feature into a preset first fully-connected network to obtain alternative scores respectively corresponding to each image block, and input each sensing feature into a preset second fully-connected network to obtain region level significance respectively corresponding to each image block; the region-level significance is used for representing the influence weight of the image block on the visual security evaluation of the perception encrypted image; and the third acquisition module is configured to acquire a visual security score corresponding to the perceptually encrypted image according to each region level saliency and each alternative score.

In some embodiments, the apparatus for evaluating visual security of an image comprises: a processor and a memory storing program instructions, the processor being configured to, upon execution of the program instructions, perform the above-described method for assessing visual security of an image.

In some embodiments, the electronic device comprises the above-described apparatus for assessing visual security of an image.

In some embodiments, the storage medium stores program instructions that, when executed, perform the above-described method for assessing visual security of an image.

The method and the device for evaluating the visual security of the image, the electronic equipment and the storage medium provided by the embodiment of the disclosure can realize the following technical effects: by obtaining a perceptually encrypted image; preprocessing a perception encrypted image to obtain a plurality of image blocks; extracting perception features corresponding to the image blocks respectively by using a preset feature extraction network; inputting each perception characteristic into a preset first full-connection network to obtain alternative scores corresponding to each image block, and inputting each perception characteristic into a preset second full-connection network to obtain region level significance corresponding to each image block; the region level significance is used for representing the influence weight of the image block on the visual security evaluation of the perception encrypted image; and acquiring a visual security score corresponding to the perception encrypted image according to the region level significance and the alternative scores. In this way, the perception features corresponding to the image blocks are extracted through the feature extraction network, so that a user does not need to manually extract the image features of the perception encrypted image; meanwhile, the clear text image of the perception encrypted image does not need to be acquired for comparison and reference, and the visual security score corresponding to the perception encrypted image is acquired through the region level significance and the alternative scores, so that the visual security assessment of the perception encrypted image can be performed more conveniently.

The foregoing general description and the following description are exemplary and explanatory only and are not restrictive of the application.

Drawings

One or more embodiments are illustrated by way of example in the accompanying drawings, which correspond to the accompanying drawings and not in limitation thereof, in which elements having the same reference numeral designations are shown as like elements and not in limitation thereof, and wherein:

FIG. 1 is a schematic diagram of a method for assessing visual security of an image provided by an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of another method for assessing visual security of an image provided by an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of another method for assessing visual security of an image provided by an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a framework for a method for evaluating visual security of an image according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of a dual-attention residual block with the number of channels X in a feature extraction network according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of a preset first fully-connected network and a preset second fully-connected network in a visual security regression network provided by an embodiment of the present disclosure;

FIG. 7 is a schematic diagram of an apparatus for assessing visual security of an image according to an embodiment of the present disclosure;

fig. 8 is a schematic diagram of another apparatus for evaluating visual security of an image according to an embodiment of the present disclosure.

Detailed Description

So that the manner in which the features and elements of the disclosed embodiments can be understood in detail, a more particular description of the disclosed embodiments, briefly summarized above, may be had by reference to the embodiments, some of which are illustrated in the appended drawings. In the following description of the technology, for purposes of explanation, numerous details are set forth in order to provide a thorough understanding of the disclosed embodiments. However, one or more embodiments may be practiced without these details. In other instances, well-known structures and devices may be shown in simplified form in order to simplify the drawing.

The terms "first," "second," and the like in the description and in the claims, and the above-described drawings of embodiments of the present disclosure, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the present disclosure described herein may be made. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions.

The term "plurality" means two or more unless otherwise specified.

In the embodiment of the present disclosure, the character "/" indicates that the preceding and following objects are in an or relationship. For example, A/B represents: a or B.

The term "and/or" is an associative relationship that describes objects, meaning that three relationships may exist. For example, a and/or B, represents: a or B, or A and B.

The term "correspond" may refer to an association or binding relationship, and a corresponds to B refers to an association or binding relationship between a and B.

The technical scheme in the embodiment of the invention can be applied to electronic equipment. Alternatively, the electronic device comprises a computer or server or the like.

In the embodiment of the invention, the perception encrypted image is obtained through the electronic equipment, and the visual security score corresponding to the perception encrypted image can be obtained, so that the visual security evaluation of the perception encrypted image can be realized.

With reference to fig. 1, an embodiment of the present disclosure provides a method for evaluating visual security of an image, including:

step S101, the electronic equipment acquires a perception encrypted image.

Step S102, the electronic device preprocesses the perception encrypted image to obtain a plurality of image blocks.

Step S103, the electronic device extracts the perception features corresponding to the image blocks respectively by using a preset feature extraction network.

Step S104, inputting each perception characteristic into a preset first fully-connected network by the electronic equipment to obtain alternative scores corresponding to each image block respectively, and inputting each perception characteristic into a preset second fully-connected network to obtain the region level significance corresponding to each image block respectively; the region-level saliency is used for representing the influence weight of the image block on the visual security evaluation of the perceptual encryption image.

And S105, the electronic equipment acquires the visual security score corresponding to the perception encrypted image according to the region level significance and the alternative scores.

By adopting the method for evaluating the visual security of the image, which is provided by the embodiment of the disclosure, the perception encrypted image is obtained; preprocessing a perception encrypted image to obtain a plurality of image blocks; extracting perception features corresponding to the image blocks respectively by using a preset feature extraction network; inputting each perception characteristic into a preset first full-connection network to obtain alternative scores corresponding to each image block, and inputting each perception characteristic into a preset second full-connection network to obtain region level significance corresponding to each image block; the region level significance is used for representing the influence weight of the image block on the visual security evaluation of the perception encrypted image; and obtaining a visual security score corresponding to the perception encrypted image according to the region-level significance and the alternative scores. In this way, the perception features corresponding to the image blocks are extracted through the feature extraction network, so that a user does not need to manually extract the image features of the perception encrypted image; meanwhile, the clear text image of the perception encrypted image does not need to be acquired for comparison and reference, and the visual security score corresponding to the perception encrypted image is acquired through the region level significance and the alternative scores, so that the visual security assessment of the perception encrypted image can be performed more conveniently.

Optionally, the electronic device performs preprocessing on the perceptually encrypted image to obtain a plurality of image blocks, including: the electronic equipment samples image blocks of the perception encrypted image to obtain a plurality of alternative image blocks with the same size; and carrying out image block normalization on each alternative image block to obtain each image block.

In some embodiments, the electronic device tile samples the perceptually encrypted image to obtain a plurality of 32 x 32 alternative tiles.

In some embodiments, the image block sampling refers to dividing an image into image blocks with the same size and without overlapping, and then randomly sampling the image blocks to obtain a plurality of alternative image blocks. Because the content and distortion of an image are locally homogeneous, namely the content and distortion in the image are presented regionally; second, CNN (Convolutional Neural Networks) based data-driven methods typically require a large number of training samples. When the network is trained, a plurality of alternative image blocks are obtained by sampling the image blocks from each perception encrypted image, so that not only can a training data set be further enriched, but also the network training efficiency can be improved.

Optionally, performing image block normalization on each candidate image block to obtain each image block, including: by calculation of

Obtaining each image block; wherein the content of the first and second substances,

for the ith normalized image block, B_iIs an alternative image block. In this way, the pixel values of the candidate image blocks can be normalized to be within a uniform unit interval, and the stability of the feature extraction network can be maintained.

In some embodiments, when the network is trained, each image block is obtained by performing image block normalization on each candidate image block, and pixel values of each candidate image block can be normalized to be within a uniform unit interval, so that the stability of the feature extraction network can be maintained in the training process.

With reference to fig. 2, an embodiment of the present disclosure provides a schematic diagram of a method for evaluating visual security of an image, including:

in step S201, the electronic device acquires a perceptual encrypted image.

Step S202, the electronic device samples image blocks of the perception encrypted image to obtain a plurality of alternative image blocks with the same size.

Step S203, the electronic device performs image block normalization on each candidate image block to obtain each image block.

Step S204, the electronic equipment extracts the perception features corresponding to the image blocks respectively by using a preset feature extraction network.

Step S205, the electronic device inputs each perception feature into a preset first fully-connected network to obtain alternative scores corresponding to each image block, and inputs each perception feature into a preset second fully-connected network to obtain region level significance corresponding to each image block; the region-level significance is used for representing the influence weight of the image block on the visual security assessment of the perceptual encryption image.

And S206, the electronic equipment acquires the visual security scores corresponding to the perception encrypted images according to the region level significance and the alternative scores.

By adopting the method for evaluating the visual security of the image, which is provided by the embodiment of the disclosure, the perception characteristics corresponding to the image blocks are extracted through the characteristic extraction network, so that a user does not need to manually extract the image characteristics of the perception encrypted image; meanwhile, the clear text image of the perception encrypted image does not need to be acquired for comparison and reference, and the visual security score corresponding to the perception encrypted image is acquired through the region level significance and the alternative scores, so that the visual security assessment of the perception encrypted image can be performed more conveniently.

Optionally, the preset feature extraction Network is DARNet (Dual-Attention Residual Network).

In some embodiments, a dual attention residual network is used to extract perceptual features in perceptually encrypted images that are closely related to visual information content.

In some embodiments, the DARNet is composed of 5 DARBs (Dual-Attention Residual Block) each including a Pixel level (Pixel-Wise) and a Channel level (Channel-Wise) Attention mechanism and a Residual join operation. This can enhance useful perceptual features and stable network training, while at the same time, extracting more effective perceptual features from low quality perceptually encrypted images.

In some embodiments, the size of each image block in the preset feature extraction network is 32 × 32 image blocks, and the output of the feature extraction network is a feature vector with one dimension of 512, that is, a perceptual feature corresponding to each image block.

In some embodiments, DARNet includes 5 dual attention residual blocks: a first dual attention residual block, a second dual attention residual block, a third dual attention residual block, a fourth dual attention residual block, and a fifth dual attention residual block; wherein the channel number of the first dual attention residual block is 32, the channel number of the second dual attention residual block is 64, the channel number of the third dual attention residual block is 128, the channel number of the fourth dual attention residual block is 256, and the channel number of the fifth dual attention residual block is 512.

Optionally, the feature extraction network includes a first dual attention residual block, a second dual attention residual block, a third dual attention residual block, a fourth dual attention residual block, and a fifth dual attention residual block, and the electronic device extracts, by using a preset feature extraction network, perceptual features respectively corresponding to the image blocks, including: the electronic equipment inputs each image block into the first double attention residual block, and obtains the global feature map corresponding to each image block through the convolution layer in the first double attention residual block

Wherein, X is the number of channels of the global feature map, and X is 32. Respectively acquiring a pixel level attention feature map and a channel level attention feature map corresponding to the global feature map; acquiring a first double attention feature map according to a pixel level attention feature map and a channel level attention feature map corresponding to the global feature map; and acquiring a first output characteristic of the first double attention force residual block according to the global characteristic diagram and the first double attention force characteristic diagram. Determining a first output characteristic of the first double attention residual block as an input characteristic of the second double attention residual block, and acquiring a global characteristic map corresponding to the first output characteristic through a convolutional layer in the second double attention residual block

Wherein, X is the number of channels of the global feature map, and X is 64; respectively acquiring a pixel level attention feature map and a channel level attention feature map corresponding to the global feature map; acquiring a second double attention feature map according to the pixel level attention feature map and the channel level attention feature map corresponding to the global feature map; obtaining a second input of a second double attention force residual block according to the global feature map and the second double attention force feature mapAnd (5) performing characterization. Determining the second output characteristic of the second double attention residual block as the input characteristic of a third double attention residual block, and acquiring a global characteristic map corresponding to the second output characteristic through a convolution layer in the third double attention residual block

Wherein, X is the number of channels of the global feature map, and X is 128; respectively acquiring a pixel level attention feature map and a channel level attention feature map corresponding to the global feature map; acquiring a third dual attention feature map according to the pixel level attention feature map and the channel level attention feature map corresponding to the global feature map; and acquiring a third output characteristic of the third dual-attention residual block according to the global characteristic map and the third dual-attention characteristic map. Determining a third output characteristic of the third dual attention residual block as an input characteristic of a fourth dual attention residual block, and acquiring a global characteristic map corresponding to the third output characteristic through a convolution layer in the fourth dual attention residual block

Wherein, X is the number of channels of the global feature map, and X is 256; respectively acquiring a pixel level attention feature map and a channel level attention feature map corresponding to the global feature map; acquiring a fourth double attention feature map according to the pixel level attention feature map and the channel level attention feature map corresponding to the global feature map; and acquiring a fourth output characteristic of the fourth dual attention residual block according to the global characteristic map and the fourth dual attention characteristic map. Determining a fourth output feature of the fourth dual-attention residual block as an input feature of a fifth dual-attention residual block, and acquiring a global feature map corresponding to the fourth output feature through a convolutional layer in the fifth dual-attention residual block

Wherein, X is the number of channels of the global feature map, and X is 512; respectively acquiring a pixel level attention feature map and a channel level attention feature map corresponding to the global feature map; according to the pixel level attention characteristic map and the channel level attention corresponding to the global characteristic mapAcquiring a fifth dual attention feature map from the feature map; and acquiring a fifth output characteristic of the fifth dual attention residual block according to the global characteristic map and the fifth dual attention characteristic map.

In some embodiments, the global feature map is

Wherein, H, W and X respectively represent the height, width and channel number of the global feature map.

In some embodiments, the number of channels of the global feature map in the first dual attention residual block is X-32; the number of channels of the global feature map in the second dual attention residual block is X-64; the number of channels of the global feature map in the third dual-attention residual block is X-128; the number of channels of the global feature map in the fourth dual attention residual block is X-256; the global feature map in the fifth dual-attention residual block has X512 channels.

Optionally, the convolutional layer corresponding to the dual attention residual block is a 3 × 3 × X convolutional layer. Optionally, X is 32, 64, 128, 256 or 512.

In some embodiments, the convolutional layer corresponding to the first dual attention residual block is a 3 × 3 × 32 convolutional layer; the convolution layer corresponding to the second double attention force residual block is a convolution layer of 3 multiplied by 64; the convolution layer corresponding to the third dual attention residual block is a convolution layer of 3 × 3 × 128; the convolution layer corresponding to the fourth dual attention residual block is a convolution layer of 3 × 3 × 256; the convolution layer corresponding to the fifth dual attention residual block is a convolution layer of 3 × 3 × 512.

Optionally, the pixel level attention feature map is

Wherein the content of the first and second substances,

for a pixel level attention profile with X number of channels,

h is a global feature map

Corresponding high, W is the global feature map

Corresponding width, X is the global feature map

The corresponding number of channels.

Optionally, the channel level attention profile is

Wherein the content of the first and second substances,

for a channel level attention profile with a channel number X,

h is a global feature map

Corresponding high, W is the global feature map

Corresponding width, X is the global feature map

The corresponding number of channels.

Optionally by calculation

Obtaining a double attention feature map corresponding to a double attention residual error module with the channel number X; wherein, DA_XA dual attention feature map corresponding to the dual attention residual module with the number of channels X,

for a pixel level attention profile with X number of channels,

for a channel level attention profile with a channel number X,

is a global feature map with a number of channels X,

for characterizing pixel-level based multiplication operations,

for characterizing pixel-level based addition operations.

Optionally by calculation

Obtaining output characteristics corresponding to the double attention residual error module with the channel number X; wherein, O_XFor output features corresponding to the dual attention residual module with X number of channels, DA_XA dual attention feature map corresponding to the dual attention residual module with the number of channels X,

is a global feature map with the number of channels being X.

Optionally, the first dual attention residual block, the second dual attention residual block, the third dual attention residual block, the fourth dual attention residual block, and the fifth dual attention residual block are all dual attention residual blocks with a channel number of X. Optionally, X is 32, 64, 128, 256 or 512.

In some embodiments, the first dual attention residual block is a "32 channel number dual attention residual block"; the second dual attention residual block is a "dual attention residual block with channel number 64"; the third dual attention residual block is a "dual attention residual block with 128 channels"; the fourth dual attention residual block is a "256 channel number dual attention residual block"; the fifth dual attention residual block is "dual attention residual block with number of channels 512".

In some embodiments, the first output feature is an output feature corresponding to the first dual attention feature module, that is, the first output feature is an output feature corresponding to the dual attention residual module with the channel number of 32; the second output characteristic is an output characteristic corresponding to the second dual attention residual module, that is, the second output characteristic is an output characteristic corresponding to the dual attention residual module with the channel number of 64; the third output feature is an output feature corresponding to the third dual attention feature module, that is, the third output feature is an output feature corresponding to the dual attention residual error module with the channel number of 128; the fourth output characteristic is an output characteristic corresponding to the fourth dual attention characteristic module, that is, the fourth output characteristic is an output characteristic corresponding to the dual attention residual error module with the channel number of 256; the fifth output characteristic is an output characteristic corresponding to the fifth dual attention characteristic module, that is, the fifth output characteristic is an output characteristic corresponding to the dual attention residual module with the channel number being 512.

In some embodiments, the first dual attention feature map is a dual attention feature map corresponding to a 32 channel number dual attention residual module; the second double attention feature map is a double attention feature map corresponding to the double attention residual error module with the channel number of 64; the third dual attention feature map is a dual attention feature map corresponding to the dual attention residual module with the channel number of 128; the fourth dual attention feature map is a dual attention feature map corresponding to the dual attention residual module with 256 channels; the fifth dual attention feature map is the dual attention feature map corresponding to the dual attention residual module with the channel number 512.

Optionally, the electronic device inputs each sensing feature into a preset first fully-connected network to obtain alternative scores corresponding to each image block, and inputs each sensing feature into a preset second fully-connected network to obtain region level significance corresponding to each image block; the region-level saliency is used for representing the influence weight of the image block on the visual security evaluation of the perceptual encryption image.

In some embodiments, the importance of evaluating the visual security of an image may vary because different regions of the image contain different visual content in the same image. By introducing a Region-wide attention mechanism, alternative scores corresponding to all image blocks are obtained through a preset first full-connection network, and sensing characteristics are input into a preset second full-connection network to obtain Region-wide significance corresponding to all image blocks, so that the influence of different regions of an image on visual safety assessment can be measured, and the performance of the method for assessing the visual safety of the image is improved.

Optionally, the preset first fully-connected network and the preset second fully-connected network are both composed of a preset number of fully-connected layers. Optionally, the preset number is 4.

In some embodiments, the predetermined first fully connected network and the predetermined second fully connected network are each composed of 4 fully connected layers, for example: FC-1024, FC-512, FC-256, and FC-1.

Optionally, the obtaining, by the electronic device, the visual security score corresponding to the perceptual encryption image according to the region-level saliency and the alternative score includes: the electronic equipment normalizes the significance of each area level to obtain the target weight corresponding to each image block; and acquiring a visual security score corresponding to the perception encrypted image according to each target weight and each alternative score.

Optionally, the normalizing the saliency of each region level by the electronic device to obtain the target weight corresponding to each image block includes: electronic device through calculation

Obtaining target weight corresponding to each image block; wherein the content of the first and second substances,

is the target weight corresponding to the ith image block, w_iFor the region level saliency corresponding to the ith image block, N_PEncrypting images for perceptionNumber of corresponding all image blocks, w_jThe significance of the area level corresponding to the jth image block.

Optionally, the obtaining, by the electronic device, the visual security score corresponding to the perceptual encryption image according to the target weight and each candidate score includes: electronic device passing through computation

Obtaining a visual security score corresponding to the perception encrypted image; wherein the content of the first and second substances,

to perceive the visual security score corresponding to the encrypted image,

is the target weight corresponding to the ith image block, N_PTo sense the number of all image blocks corresponding to an encrypted image,

and scoring the candidate corresponding to the ith image block.

Optionally, after the electronic device obtains the visual security score corresponding to the perceptual encryption image according to the region-level saliency and the alternative scores, the method further includes: and the electronic equipment displays the visual security score corresponding to the perception encrypted image to the user.

With reference to fig. 3, an embodiment of the present disclosure provides a schematic diagram of a method for evaluating visual security of an image, including:

in step S301, the electronic device obtains a perceptual encrypted image.

Step S302, the electronic device preprocesses the perception encrypted image to obtain a plurality of image blocks.

Step S303, the electronic device extracts, by using a preset feature extraction network, perceptual features corresponding to the image blocks, respectively.

Step S304, the electronic equipment inputs each perception characteristic into a preset first fully-connected network to obtain alternative scores corresponding to each image block respectively, and inputs each perception characteristic into a preset second fully-connected network to obtain region level significance corresponding to each image block respectively; the region-level saliency is used for representing the influence weight of the image block on the visual security evaluation of the perceptual encryption image.

Step S305, the electronic equipment obtains the visual security score corresponding to the perception encrypted image according to the region level significance and the alternative scores.

And S306, the electronic equipment displays the visual security score corresponding to the perception encrypted image to the user.

By adopting the method for evaluating the visual security of the image, which is provided by the embodiment of the disclosure, the perception characteristics corresponding to the image blocks are extracted through the characteristic extraction network, so that a user does not need to manually extract the image characteristics of the perception encrypted image; meanwhile, the clear text image of the perception encrypted image does not need to be obtained for comparison and reference, and the visual security score corresponding to the perception encrypted image is obtained through the region level significance and the alternative scores, so that the visual security assessment of the perception encrypted image can be more conveniently carried out, the assessed visual security score corresponding to the perception encrypted image can be displayed to a user, and the user can conveniently know the security degree of the perception encrypted image in time.

Fig. 4 is a schematic diagram of a framework structure of a method for evaluating visual security of an image according to an embodiment of the present disclosure; the frame structure of the method comprises a preprocessing module, a feature extraction network and a visual safety regression network; the perception encrypted image is preprocessed through the preprocessing module to obtain a plurality of image blocks, and the feature extraction network comprises five double attention residual blocks, such as: a first dual attention residual block, a second dual attention residual block, a third dual attention residual block, a fourth dual attention residual block, and a fifth dual attention residual block; a first dual attention residual block; the method comprises the steps of inputting each image block into a feature extraction network to obtain perception features corresponding to each image block, inputting each perception feature into a visual safety regression network, obtaining alternative scores corresponding to each image block through a first full-connection network in the visual safety regression network, obtaining area-level significance corresponding to each image block through a second full-connection network in the visual safety regression network, and conducting weighted fusion on each alternative score and each area-level significance to obtain a visual safety score corresponding to a perception encrypted image.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a dual attention residual block with a channel number X in a feature extraction network according to an embodiment of the present disclosure; wherein X is used to characterize the number of channels, Input, of the dual attention residual block_XOutput, an input feature of a Dual attention residual Block with channel number X_XFor the output characteristics of a dual-attention-residual block with channel number X, Conv ReLU is the activation function of the convolutional layer,

for characterizing pixel-level based multiplication operations,

for characterizing pixel-level based addition operations. In some embodiments, the input features corresponding to the convolutional layer are obtained through a convolutional layer in a dual-attention residual block

And respectively obtain

Corresponding channel level and pixel level attention feature maps will be

And after multiplication operation is respectively carried out on the channel level attention feature map and the pixel level attention feature map, addition operation is carried out to obtain output features, namely the dual attention feature map corresponding to the dual attention residual block with the channel number of X. In some embodiments, the first dual attention residual block has a number of channels of 32; the number of channels of the second dual attention residual block is 64; the number of channels of the third dual residual of attention block is 128; the number of channels of the fourth dual-attention residual block is 256; the number of channels of the fifth dual attention residual block is 512.

Referring to fig. 6, fig. 6 is a schematic structural diagram of a first fully connected network and a second fully connected network in a visual security regression network according to an embodiment of the present disclosure; the candidate scores corresponding to the image blocks are obtained by inputting the perception features corresponding to the image blocks into a preset first fully-connected network, and the region level significance corresponding to the image blocks is obtained by inputting the perception features corresponding to the image blocks into a preset second fully-connected network. In some embodiments, the network structure of the predetermined first fully connected network is the same as the network structure of the predetermined second fully connected network. In some embodiments, the predetermined first fully connected network and the predetermined second fully connected network are each comprised of 4 fully connected layers, such as: FC-1024, FC-512, FC-256, and FC-1.

Optionally, the embodiment of the present disclosure provides an evaluation model for evaluating visual security of an image, where the evaluation model includes a feature extraction network and a visual security regression network. In some embodiments, a plurality of image blocks are obtained by preprocessing the perception encrypted image through a preprocessing module, and perception features respectively corresponding to the image blocks are extracted through a feature extraction network in an evaluation model; and inputting the perception characteristics corresponding to each image block into a visual security regression network to obtain the visual security score of the perception encrypted image.

Optionally, the evaluation model for evaluating the visual security of the image is obtained by: obtaining a plurality of training samples and sample labels corresponding to the training samples; the sample label is used for representing the actual visual safety score corresponding to the training sample; and inputting each training sample with the sample label into the deep learning model for training to obtain an evaluation model. Optionally, the training samples are image blocks corresponding to the perceptually encrypted images.

Optionally, sample label distribution is performed on the image block corresponding to the perceptually encrypted image through the preprocessing module.

Optionally, the evaluation model corresponds to a loss function of

Where Loss is the Loss function, N_PIs a feeling ofKnowing the number of image blocks corresponding to the encrypted image,

candidate score, y, for the prediction corresponding to the ith image block_iThe actual visual security score of the ith image block, i.e. the sample label corresponding to the ith image block,

and q is the predicted visual security score corresponding to the perception encrypted image, and q is the actual visual security score corresponding to the perception encrypted image.

Alternatively, by θ^*＝argmin_θAnd the Loss function corresponding to the evaluation model is optimized by Loss to obtain the optimal parameters corresponding to the evaluation model.

In some embodiments, the training is stopped when the loss function corresponding to the evaluation model converges within a preset interval.

In some embodiments, the method for evaluating image Visual Security provided by the present invention is a Triple Attention-Based Deep-Reference Visual Security evaluation method (TAVSI) Based on a Triple Attention mechanism, which does not need to refer to a plaintext image and manually extract features, and consists of a feature extraction network and a Visual Security regression network. The feature extraction network introduces a Pixel-level (Pixel-Wise) and Channel-level (Channel-Wise) attention mechanism to enable efficient features that can characterize visual information of an image. The visual security regression network introduced a Region-wide attention mechanism to measure the impact of different regions of the image on the visual security assessment. Compared with the existing visual safety index, the method for evaluating the visual safety of the image provided by the embodiment of the invention has more excellent performance.

As shown in fig. 7, an embodiment of the present disclosure provides an apparatus for evaluating visual security of an image, including: a first obtaining module 701, a preprocessing module 702, a feature extraction module 703, a second obtaining module 704 and a third obtaining module 705; the first obtaining module 701 is configured to obtain a perception encrypted image, and send the perception encrypted image to the preprocessing module; the preprocessing module 702 is configured to receive the perception encrypted image sent by the first obtaining module, preprocess the perception encrypted image to obtain a plurality of image blocks, and send each image block to the feature extraction module; the feature extraction module 703 is configured to receive each image block sent by the preprocessing module, extract perceptual features corresponding to each image block by using a preset feature extraction network, and send the perceptual features corresponding to each image block to the second acquisition module; the second obtaining module 704 is configured to receive the perceptual features corresponding to the image blocks sent by the feature extraction module, input the perceptual features into a preset first fully-connected network to obtain alternative scores corresponding to the image blocks, input the perceptual features into a preset second fully-connected network to obtain region-level saliency corresponding to the image blocks, and send the alternative scores and the region-level saliency to the third obtaining module; the region level significance is used for representing the influence weight of the image block on the visual security evaluation of the perception encrypted image; the third obtaining module 705 is configured to receive each candidate score and each area level saliency sent by the second obtaining module, and obtain a visual security score corresponding to the perceptual encryption image according to each area level saliency and each candidate score.

By adopting the device for evaluating the visual security of the image, which is provided by the embodiment of the disclosure, the perception encrypted image is obtained through the first obtaining module; the preprocessing module preprocesses the perception encrypted image to obtain a plurality of image blocks; the feature extraction module extracts perception features corresponding to the image blocks respectively by using a preset feature extraction network; the second acquisition module inputs each perception characteristic into a preset first full-connection network to obtain alternative scores corresponding to each image block, and inputs each perception characteristic into a preset second full-connection network to obtain region level significance corresponding to each image block; the region level significance is used for representing the influence weight of the image block on the visual security evaluation of the perception encrypted image; and the third acquisition module acquires the visual security score corresponding to the perception encrypted image according to the saliency of each regional level and each alternative score. In this way, the perception features corresponding to the image blocks are extracted through the feature extraction network, so that a user does not need to manually extract the image features of the perception encrypted image; meanwhile, the clear text image of the perception encrypted image does not need to be acquired for comparison and reference, and the visual security score corresponding to the perception encrypted image is acquired through the region level significance and the alternative scores, so that the visual security assessment of the perception encrypted image can be performed more conveniently.

In some embodiments, the method for evaluating the visual security of an image provided by the embodiments of the present invention is verified and analyzed through a preset test database. Optionally, the test data table includes an IVC-SelectEncrypt database and a PEID database. The IVC-selecteencrypt database includes 200 low-quality encrypted images in total, which are obtained by encrypting 8 original high-quality images with 5 security levels according to 5 encryption algorithms. The PEID database contains 1080 perceptually encrypted images, which were obtained by encrypting 20 plaintext images respectively according to 10 perceptually encryption algorithms. The image of the PEID database has two subjective scores, namely a visual quality score and a visual safety score, and the visual safety score is used as a label for visual safety evaluation in the embodiment of the invention.

In some embodiments, the performance of the visual security index is analyzed in terms of both Monotonicity (Monotonicity) and Accuracy (Accuracy).

In some embodiments, monotonicity is used to measure the correlation between the visual security assessment indicator prediction results and the human visual system assessment results. In the embodiment of the invention, the monotonicity between the objective score and the subjective score of the visual safety index on a test database is measured by adopting SRCC (Spearman Rank Correlation Coefficient) and KRCC (Kendall Rank Correlation Coefficient).

In some embodiments, by calculation

Obtaining monotonicity between the objective score and the subjective score; wherein SRCC (S, O) is monotonicity between subjective score S and objective score O, S_iSubjective scoring of the ith perceptually encrypted image, O_iIs the ithAnd the objective score of the web perception encrypted image, wherein N is the number of all perception encrypted images.

In some embodiments, by calculation

Obtaining a dependency between a first set of preset variables and a second set of preset variables; wherein KRCC (S, O) is a dependency between a first set of preset variables and a second set of preset variables, N_cFor the number of the first set of preset variables and the second set of preset variables considered to be identical, N_dN is the number of all perceptually encrypted images for which the first set of predefined variables and the second set of predefined variables are considered inconsistent. Optionally, the first set of preset variables is (S)_i,O_i) (ii) a Wherein S is_iSubjective scoring of the ith perceptually encrypted image, O_iAnd (4) objective grading of the ith perceptually encrypted image. Optionally, the second set of preset variables is (S)_j,O_j) (ii) a Wherein S is_jSubjective scoring for the jth perceptually encrypted image, O_jAnd the objective score of the jth perception encrypted image is obtained. Optionally, if and only if at S_i＞S_j,O_i＞O_jOr S_i＜S_j,O_i＜O_jDetermining that the first set of preset variables and the second set of preset variables are considered to be consistent; otherwise, it is determined that the first set of preset variables and the second set of preset variables are deemed inconsistent.

In some embodiments, accuracy is used to assess the degree of agreement between objective results of visual safety metrics and subjective results of the human visual system. In the embodiment of the invention, two standards, namely PLCC (Pearson Linear Correlation Coefficient) and RMSE (Root Mean square Error), are adopted to measure the accuracy of the visual safety index.

In some embodiments, by calculation

Obtaining a linear correlation coefficient between the subjective score S and the objective score O' after nonlinear fitting; whereinPLCC (S, O ') is the linear correlation coefficient between the subjective score and the non-linearly fitted objective score, cov (S, O') is the covariance between the subjective score and the non-linearly fitted objective score, δ_SIs the standard deviation, δ, corresponding to the subjective score S_O'And (4) standard deviation corresponding to the objective score O' after nonlinear fitting.

In some embodiments, embodiments of the present disclosure provide a non-linear fitting function for non-linearly fitting the objective score as

Wherein alpha is₁For the first parameter to be fitted, α₂For the second parameter to be fitted, α₄To the fourth parameter to be fitted, α₅All are the fifth parameters to be fitted, x is the input, f (x) is the output, exp is used to characterize the exponential function.

In some embodiments, by calculation

Obtaining a root mean square error between the subjective score and the non-linearly fitted objective score; wherein, RMSE (S, O ') is the root mean square error between the subjective score and the objective score after nonlinear fitting, S is the subjective score, O' is the objective score after nonlinear fitting, and N is the number of all perception encrypted images.

In some embodiments, the higher the SRCC, KRCC, PLCC, and RMSE values between the objective and subjective scores, and the lower the RMSE value, the better the performance of the method to obtain the objective score is determined.

In some embodiments, the images in the test database are randomly rotated and flipped. This enables the goal of data enhancement to be achieved, thereby providing a large number of data samples for network training. And dividing the image subjected to data enhancement into a training set and a testing set. By randomly dividing the plaintext image, for example: dividing 80% of images into a training set, and dividing the rest 20% of images into a test set; and the corresponding ciphertext images are divided into corresponding sets respectively. This avoids overlapping of the image content in the training set and the test set.

In some embodiments, in training the network, each mini-batch contains 512 tiles from two encrypted images, i.e., 256 tiles from each image. The learning rate of the evaluation model for evaluating the visual security of images on the IVC-SelectEncrypt database and the PEID database was 10^-4. To prevent the network from overfitting, Dropout regularization with a probability of 0.5 is added after each fully connected layer. In some embodiments, the data set is divided into 10 times randomly, the evaluation model is trained for 10 times, and the median of the 10 training results is determined as the final experimental result, so that the experimental result is more stable.

In some embodiments, in combination with the table 1, table 1 is a table of performance comparison examples provided by embodiments of the present disclosure; the performance comparison example table comprises a table comparing the performance of the method for evaluating the visual security of the image and the rest of mainstream algorithms provided by the embodiment of the invention on the IVC-SelectEncrypt database and the PEID database.

TABLE 1

In some embodiments, as shown in table 1, PSNR (peak signal-to-noise ratio), SSIM (structural similarity), VIF (visual information fidelity), LSS (luminance similarity score), ESS (edge similarity score), LFBVS (local feature-based visual security index), VSI-Canny (Canny edge detection operator-based visual security index), IIBVSI (image importance-based visual security index), and CNNVSI (Convolutional Neural Network-based visual security index) are all visual security indexes, and PVCN (variance-based image block selection Convolutional Network) and TSCN (dual-flow Convolutional Network) are CNN (Convolutional Neural Network) -based image quality evaluation models; TAVSI is a Triple Attention-Based depth non-Reference Visual Security Index (TAVSI) method Based on a Triple Attention mechanism, that is, a method for evaluating Visual Security of an image provided by an embodiment of the present invention. As can be seen from table 1, the method for evaluating visual security of an image provided by the embodiment of the present invention, TAVSI, and the rest of mainstream algorithms respectively have SRCC value, KRCC value, PLCC value, and RMSE value on the IVC-selecteencrypt database and the PEID database; comparison shows that the performance of TAVSI on both test databases is better than that of the comparison algorithm. This is because existing visual security assessment indicators typically focus on certain specific types of features of the image, such as edges, textures, and image entropy, and thus do not capture all of the perceptual information of the encrypted image. However, the CNN-based evaluation model proposed by the embodiment of the present invention can automatically learn the perceptual features of the image without any specific constraint. Therefore, the performance of CNNVSI is superior to other indexes. In addition, the performance of the evaluation model provided by the embodiment of the invention is far better than that of the CNN-based IQA (image quality evaluation) algorithm. This is because the IQA model is for moderately high quality distorted images, and the distortion of these images has a fixed distortion model and is uniformly distributed throughout the image. Then, the visual quality of the perceived encrypted image tends to be poor, the distortion is non-uniformly distributed, and no fixed pattern can follow. It can be shown that the TAVSI provided by the embodiment of the present invention can obtain effective visual feature extraction and more powerful feature representation from low-quality images by introducing a triple attention mechanism, and exhibits performance highly consistent with subjective evaluation results.

As shown in fig. 8, an apparatus for evaluating visual security of an image according to an embodiment of the present disclosure includes a processor (processor)800 and a memory (memory) 801. Optionally, the apparatus may also include a Communication Interface 802 and a bus 803. The processor 800, the communication interface 802, and the memory 801 may communicate with each other via a bus 803. Communication interface 802 may be used for information transfer. The processor 800 may invoke logic instructions in the memory 801 to perform the method for assessing visual security of an image of the above-described embodiments.

By adopting the device for evaluating the visual security of the image, which is provided by the embodiment of the disclosure, the perception encrypted image is obtained; preprocessing a perception encrypted image to obtain a plurality of image blocks; extracting perception features corresponding to the image blocks respectively by using a preset feature extraction network; inputting each perception characteristic into a preset first full-connection network to obtain alternative scores corresponding to each image block, and inputting each perception characteristic into a preset second full-connection network to obtain region level significance corresponding to each image block; the region level significance is used for representing the influence weight of the image block on the visual security evaluation of the perception encrypted image; and acquiring a visual security score corresponding to the perception encrypted image according to the region level significance and the alternative scores. In this way, the perception features corresponding to the image blocks are extracted through the feature extraction network, so that a user does not need to manually extract the image features of the perception encrypted image; meanwhile, the clear text image of the perception encrypted image does not need to be acquired for comparison and reference, and the visual security score corresponding to the perception encrypted image is acquired through the region level significance and the alternative scores, so that the visual security assessment of the perception encrypted image can be performed more conveniently.

The embodiment of the disclosure provides an electronic device, which includes the above-mentioned apparatus for evaluating image visual security.

Alternatively, the electronic device comprises a computer or server or the like.

By adopting the electronic equipment provided by the embodiment of the disclosure, the perception encrypted image is obtained; preprocessing a perception encrypted image to obtain a plurality of image blocks; extracting perception features corresponding to the image blocks respectively by using a preset feature extraction network; inputting each perception characteristic into a preset first full-connection network to obtain alternative scores corresponding to each image block, and inputting each perception characteristic into a preset second full-connection network to obtain region level significance corresponding to each image block; the region level significance is used for representing the influence weight of the image block on the visual security evaluation of the perception encrypted image; and acquiring a visual security score corresponding to the perception encrypted image according to the region level significance and the alternative scores. In this way, the perception features corresponding to the image blocks are extracted through the feature extraction network, so that a user does not need to manually extract the image features of the perception encrypted image; meanwhile, the clear text image of the perception encrypted image does not need to be acquired for comparison and reference, and the visual security score corresponding to the perception encrypted image is acquired through the region level significance and the alternative scores, so that the visual security assessment of the perception encrypted image can be performed more conveniently.

In addition, the logic instructions in the memory 801 may be implemented in the form of software functional units and may be stored in a computer readable and readable storage medium when the logic instructions are sold or used as independent products.

The memory 801 is a computer-readable storage medium, and can be used for storing software programs, computer-executable programs, such as program instructions/modules corresponding to the methods in the embodiments of the present disclosure. The processor 800 executes functional applications and data processing, i.e. implements the method for assessing visual security of an image in the above-described embodiments, by executing program instructions/modules stored in the memory 801.

The memory 801 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal device, and the like. In addition, the memory 801 may include a high-speed random access memory, and may also include a nonvolatile memory.

The disclosed embodiments provide a storage medium storing program instructions that, when executed, perform the above-described method for assessing visual security of an image.

Embodiments of the present disclosure provide a computer program product comprising a computer program stored on a computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform the above-described method for assessing visual security of an image.

The computer-readable storage medium described above may be a transitory computer-readable storage medium or a non-transitory computer-readable storage medium.

The technical solution of the embodiments of the present disclosure may be embodied in the form of a software product, which is stored in a readable storage medium and includes one or more instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned readable storage medium may be a non-transitory readable storage medium comprising: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes, and may also be a transient readable storage medium.

The above description and drawings sufficiently illustrate embodiments of the disclosure to enable those skilled in the art to practice them. Other embodiments may incorporate structural, logical, electrical, process, and other changes. The examples merely typify possible variations. Individual components and functions are optional unless explicitly required, and the sequence of operations may vary. Portions and features of some embodiments may be included in or substituted for those of others. Furthermore, the words used in the specification are words of description only and are not intended to limit the claims. As used in the description of the embodiments and the claims, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. Similarly, the term "and/or" as used in this application is meant to encompass any and all possible combinations of one or more of the associated listed. Furthermore, the terms "comprises" and/or "comprising," when used in this application, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. Without further limitation, an element defined by the phrase "comprising an …" does not exclude the presence of additional identical elements in the process, method or apparatus comprising the element. In this document, each embodiment may be described with emphasis on differences from other embodiments, and the same and similar parts between the respective embodiments may be referred to each other. For methods, products, etc. of the embodiment disclosures, reference may be made to the description of the method section for relevance if it corresponds to the method section of the embodiment disclosure.

Those of skill in the art would appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software may depend upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments. It can be clearly understood by the skilled person that, for convenience and brevity of description, the specific working processes of the system, the apparatus and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the embodiments disclosed herein, the disclosed methods, products (including but not limited to devices, apparatuses, etc.) may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units may be merely a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form. The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to implement the present embodiment. In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. In the description corresponding to the flowcharts and block diagrams in the figures, operations or steps corresponding to different blocks may also occur in different orders than disclosed in the description, and sometimes there is no specific order between the different operations or steps. For example, two sequential operations or steps may in fact be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. Each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Claims

1. A method for assessing visual security of an image, comprising:

acquiring a perception encrypted image;

preprocessing the perception encrypted image to obtain a plurality of image blocks;

extracting perception features corresponding to the image blocks respectively by using a preset feature extraction network;

inputting each sensing feature into a preset first fully-connected network to obtain alternative scores corresponding to each image block, and inputting each sensing feature into a preset second fully-connected network to obtain area level significance corresponding to each image block; the region level significance is used for representing the influence weight of the image block on the visual security evaluation of the perception encrypted image;

and acquiring a visual security score corresponding to the perception encrypted image according to each region level significance and each alternative score.

2. The method of claim 1, wherein preprocessing the perceptually encrypted image to obtain a plurality of image blocks comprises:

sampling the image blocks of the perception encrypted image to obtain a plurality of alternative image blocks with the same size;

and carrying out image block normalization on each alternative image block to obtain each image block.

3. The method of claim 1, wherein obtaining a visual security score corresponding to the perceptually encrypted image based on each of the region-level saliencies and each of the candidate scores comprises:

normalizing each region level significance to obtain a target weight corresponding to each image block;

and acquiring a visual security score corresponding to the perception encrypted image according to each target weight and each alternative score.

4. The method according to claim 3, wherein performing normalization processing on the saliency of each of the region levels to obtain the target weight corresponding to each of the image blocks comprises:

by calculation of

is the target weight corresponding to the ith image block, w_iFor the region level saliency corresponding to the ith image block, N_PFor the number, w, of all image blocks corresponding to said perceptually encrypted image_jThe significance of the area level corresponding to the jth image block.

5. The method of claim 3, wherein obtaining a visual security score corresponding to the perceptually encrypted image based on the target weight and each of the alternative scores comprises:

by calculation of

a corresponding visual security score for the perceptually encrypted image,

is the target weight corresponding to the ith image block, N_PFor the number of all image blocks corresponding to the perceptually encrypted image,

and scoring the candidate corresponding to the ith image block.

6. The method of claim 1, wherein after obtaining the visual security score corresponding to the perceptually encrypted image according to the region-level saliency and the candidate scores, further comprising:

and displaying the visual security score corresponding to the perception encrypted image to a user.

7. An apparatus for assessing visual security of an image, comprising:

a first acquisition module configured to acquire a perceptually encrypted image;

the preprocessing module is configured to preprocess the perception encrypted image to obtain a plurality of image blocks;

the feature extraction module is configured to extract perceptual features respectively corresponding to the image blocks by using a preset feature extraction network;

the second acquisition module is configured to input each sensing feature into a preset first fully-connected network to obtain alternative scores respectively corresponding to each image block, and input each sensing feature into a preset second fully-connected network to obtain region level significance respectively corresponding to each image block; the region level significance is used for representing the influence weight of the image block on the visual security evaluation of the perception encrypted image;

and the third acquisition module is configured to acquire a visual security score corresponding to the perceptually encrypted image according to each region level saliency and each alternative score.

8. An apparatus for assessing visual security of an image, comprising a processor and a memory storing program instructions, wherein the processor is configured to perform a method for assessing visual security of an image according to any one of claims 1 to 6 when executing the program instructions.

9. An electronic device comprising the apparatus for assessing visual security of an image of claim 8.

10. A storage medium storing program instructions which, when executed, perform a method for assessing visual security of an image according to any one of claims 1 to 6.