CN109191426A

CN109191426A - A kind of flat image conspicuousness detection method

Info

Publication number: CN109191426A
Application number: CN201810820639.2A
Authority: CN
Inventors: 桑庆兵; 殷莹; 李朝锋; 过榴晓; 吴小俊
Original assignee: Jiangnan University
Current assignee: Jiangnan University
Priority date: 2018-07-24
Filing date: 2018-07-24
Publication date: 2019-01-11

Abstract

The present invention proposes a kind of flat image conspicuousness detection method, can export the higher notable figure of accuracy, and method is simpler, it can be readily appreciated that being suitable for the application of.It is the following steps are included: S1: component network model；S2: training network model, the network model after being trained；S3: image data is input to the network model after training obtained in S2, carries out conspicuousness detection, exports notable figure, is handled for subsequent image；The structure of network model includes the constricted path in a left side and the path expander on a right side in step sl；Constricted path is used to capture context comprising an input layer, multiple convolutional layers and pond layer；Convolution operation is carried out by incoming first convolutional layer of the tested altimetric image of input layer input；The continuous convolutional layer of every two followed by a pond layer；Path expander is used to be accurately positioned comprising multiple up-sampling layers and convolutional layer, an output layer；Each up-sampling layer followed by two continuous convolutional layers.

Description

A kind of flat image conspicuousness detection method

Technical field

The present invention relates to computer vision and digital image processing techniques field, specially a kind of flat image conspicuousness inspection Survey method.

Background technique

Conspicuousness detects a preprocessing process as image procossing, is widely used in image quality evaluation, target The fields such as tracking, target detection, image segmentation.Such as: there are two application of the conspicuousness detection in image quality evaluation is main Aspect calculates the salient region of image to be evaluated in feature extraction phases, a feature as image to be evaluated；? Fusion Features stage, the weight that the value of pixel in notable figure is merged as each characteristic pattern；Conspicuousness detection method is usually divided For two classes: from top to bottom with it is bottom-up, since bottom-up method does not need manual annotation as priori knowledge, so mesh Preceding research all concentrates in bottom-up method；In existing research, there are many method for conspicuousness detection, such as: Frequency-tuned salient region detection (hereinafter referred to as FT), Saliency detection via Graph-based manifold ranking (hereinafter referred to as GMR), Hierarchical saliency detection(with Lower abbreviation HS), however these algorithms are in a particular application, when conspicuousness object appears in image border or works as obvious object When similar with part background information, it is easily lost target information, it may appear that the inadequate problem of accuracy.

Summary of the invention

In order to solve the problems, such as that the notable figure accuracy exported in conspicuousness detection method is inadequate, the present invention proposes a kind of flat Face image significance detection method can export the higher notable figure of accuracy, and method is simpler, it can be readily appreciated that being suitable for answering With.

The technical scheme is that such: a kind of flat image conspicuousness detection method comprising following steps:

S1: component network model；

S2: the training network model, the network model after being trained；

S3: being input to the network model after the training obtained in S2 for image data, carries out conspicuousness detection, and output is significant Figure is handled for subsequent image；

It is characterized by: the structure of the network model includes constricted path and the right side on the left of in the of one in step sl Path expander；

The constricted path is used to capture context comprising an input layer, multiple convolutional layers and pond layer；By the input Incoming first convolutional layer of the tested altimetric image of layer input carries out convolution operation；Every two is continuously after the convolutional layer And then a pond layer, an and then activation primitive after each convolutional layer；

The path expander is used to be accurately positioned comprising multiple up-sampling layers and the convolutional layer, an output layer；Each institute State up-sampling layer followed by two continuous convolutional layers, an activation letter is followed after each convolutional layer Number；

The characteristic pattern obtained after the completion of the pondization operation of the pond layer of the last one in the constricted path, then input two institutes It states convolutional layer, carries out the continuous convolution operation twice, obtained characteristic pattern is then inputted the into the path expander One up-sampling layer carries out up-sampling operation；

The number of the up-sampling layer in the path expander is identical as the number of the pond layer in the constricted path；

The number of the convolutional layer in the path expander is identical as the number of the convolutional layer in the constricted path；

The characteristic pattern exported after the up-sampling operation in the up-sampling layer every time, needs and the contraction road at the same level Then the characteristic pattern connection of pond layer output in diameter inputs the subsequent convolutional layer in the path expander again and carries out The convolution operation；

A warp lamination before the output layer, the characteristic pattern that the warp lamination exports the previous convolutional layer into Row deconvolution operation, then cut, after normalization operation, pass through notable figure needed for output layer output.

It is further characterized by:

The convolution kernel of two convolutional layers before the warp lamination is sized to 1 × 1, remaining convolutional layer The size of convolution kernel is all set to 3 × 3；

The maximum pondization operation that the step-length that the pond layer is 2 × 2 is 2；

The activation primitive is Leaky Rectified Linear Units function, calculation formula are as follows:

A is constant；

Notable figure is normalized into [0,1] using Sigmoid function in the normalization operation；

In step s 2, in training network model process, to the notable figure of output layer output, pass through function Euclidean Loss carries out conspicuousness regression analysis.

Network model in technical solution provided by the invention, by constricted path and path expander group in present networks model At constricted path is for obtaining contextual information, and path expander is for accurately positioning, and two paths are symmetrical, forms U Shape；Notable figure is generated using warp lamination in the method, deconvolution operation is carried out to characteristic pattern, then cut to notable figure And normalization operation, then obtain final output is the interested object of human eye, and present networks model is made to be more suitable for conspicuousness survey Detection method；Because pondization operation can lose image information and reduce image resolution ratio and be irreversible operation, feature is influenced The final accuracy of figure, right half part is before doing convolution operation, by the resolution ratio of the pondization operation output of left-half peer Output characteristic pattern is connected together after relatively high characteristic pattern and up-sampling operation, high-resolution characteristic pattern and more abstract Characteristic pattern combines, and can supply the information of some pictures, to obtain more accurate notable figure.

Detailed description of the invention

Fig. 1 is network architecture schematic diagram of the invention.

Specific embodiment

It is by the invention that the technical scheme comprises the following steps:

S1: component network model；

S2: using THUS data set training network model, the network model after being trained；

S3: being input to the network model after training obtained in S2 for image data, carry out conspicuousness detection, export notable figure, For subsequent image processing；

As shown in Figure 1, the network model in S1 is based on Convolutional Networks for Biomedical Image Segmentation(hereinafter referred to as U-Net) network model and design, structure include one left side constricted path and one The path expander on a right side；Present networks model belongs to network end to end, and input is piece image, and output is also piece image.

In present networks figure, constricted path is used to capture context comprising an input layer, multiple convolutional layers and pond Change layer；Convolution operation is carried out by incoming first convolutional layer of the tested altimetric image of input layer input；The continuous convolutional layer of every two Followed by a pond layer, an and then activation primitive after each convolutional layer；The convolution of convolution operation in constricted path Core is set as 3 × 3；The maximum pondization operation that the step-length that pondization operation is 2 × 2 is 2.

Activation primitive is Leaky Rectified Linear Units (hereinafter referred to as L-ReLU) function, calculation formula Are as follows:

In L-ReLU function, when the input output of x>0 is x, as x<0, output is a*x, and a is constant, is taken in the present embodiment 0.1；Activation primitive in former U-net network uses Rectified Linear Unit(hereinafter referred to as ReLU) function, ReLU letter It will no longer be updated after several activation to big gradient, L-ReLU can solve this problem, so that be worth around AnchorPoint to retain, Activation primitive L-ReLU can greatly improve training speed, and can reduce the possibility of over-fitting, so that performance of the invention More preferably.

Pond layer, convolutional layer number be configured according to the actual situation before component network, this reality as shown in Figure 1 It applies and 4 pond layers and corresponding 4 up-samplings layer is set in example.

Path expander is used to be accurately positioned comprising multiple up-sampling layers and convolutional layer, an output layer；Each up-sampling Layer followed by two continuous convolutional layers, an and then activation primitive after each convolutional layer；

In constricted path after the completion of the pondization operation of the last one pond layer, then two convolutional layers are inputted, carried out continuous twice Then first up-sampling layer in obtained characteristic pattern input p-adic extension p path is carried out up-sampling operation by convolution operation；

The number of up-sampling layer in path expander is identical as the number of pond layer in constricted path；

The number of convolutional layer in path expander is identical as the number of the convolutional layer in constricted path；

The characteristic pattern exported after the up-sampling operation in up-sampling layer every time, exports with the pond layer in constricted path at the same level Characteristic pattern connection, then input subsequent convolutional layer in path expander again and carry out convolution operation；Pond layer is defeated in constricted path The relatively high characteristic pattern of rate respectively out is defeated after operating with up-sampling after replicating and cutting (Copy and Crop) Characteristic pattern out combines, and high-resolution characteristic pattern is combined with more abstract characteristic pattern, can supply the information of some pictures, It obtains positioning more accurate characteristic pattern；

Since the target of the technical solution of this method offer is to find the interested object of human eye from image, by original U- The last one convolutional layer of Net network is changed to warp lamination, i.e., is a warp lamination before output layer, warp lamination will be previous The characteristic pattern of a convolutional layer output carries out deconvolution operation, then is cut, normalized to notable figure using Sigmoid function After [0,1], pass through notable figure needed for output layer output；

The convolution kernel of two convolutional layers before warp lamination is sized to 1 × 1, the size of the convolution kernel of remaining convolutional layer It is all set to 3 × 3；

It is reduced by above-mentioned up-sampling operation and deconvolution operation, the size of characteristic pattern, makes itself and original tested altimetric image Size it is identical, make export image and original image it is closer, confirmation get up more intuitively.

In step s 2, THUS data set training pattern is used in training network model process, is exported to by output layer Notable figure, using function Euclidean loss carry out conspicuousness regression analysis, learn conspicuousness from data.

In order to test the effect of technical solution of the present invention, pass through two common data sets ASD and ECSSD and two kinds Evaluation metrics are assessed:

(1) Area Under roc Curve (hereinafter referred to as AUC) assesses obvious object detection performance；

(2) Mean absolute error (hereinafter referred to as MAE) indicates the notable figure and benchmark image of model output (Ground-truth) mean absolute error between；

For the value of AUC and MAE within the scope of 0-1, AUC value illustrates that performance is better closer to 1, and MAE value illustrates aobvious closer to 0 Work property is better.Final testing result is shown in table 1, as can be seen from the table, compares with traditional several methods, and the present invention proposes Method on ASD and ECSSD data set, performance and conspicuousness target are all very good, have and benchmark image (Ground- Truth) higher similarity.

Table 1 is performance indicator of each model on ASD and ECSSD data set.

It after technical solution of the present invention, can be trained using less image is end-to-end, with others nerve Network model is compared, and the pondization operation in network model of the invention in path expander is sampled operation substitution, is increased The resolution ratio of characteristic pattern, by the high-resolution features figure of constricted path in conjunction with the characteristic pattern of up-sampling operation output, based on knot Image information after conjunction, continuous convolution behaviour later, which can assemble, more accurately to be exported；By the result of assessment it is recognised that It is compared with traditional several methods, method proposed by the present invention has higher similar to benchmark image (Ground-truth) Degree.

Claims

1. a kind of flat image conspicuousness detection method comprising following steps:

S1: component network model；

S2: the training network model, the network model after being trained；

2. a kind of flat image conspicuousness detection method according to claim 1, it is characterised in that: before the warp lamination The convolution kernels of two convolutional layers be sized to 1 × 1, the size of the convolution kernel of remaining convolutional layer is all arranged It is 3 × 3.

3. a kind of flat image conspicuousness detection method according to claim 1, it is characterised in that: the pond layer is 2 × 2 Step-length be 2 maximum pondization operation.

4. a kind of flat image conspicuousness detection method according to claim 1, it is characterised in that: the activation primitive is Leaky Rectified Linear Units function, calculation formula are as follows:

A is constant.

5. a kind of flat image conspicuousness detection method according to claim 1, it is characterised in that: in the normalization operation Notable figure is normalized into [0,1] using Sigmoid function.

6. a kind of flat image conspicuousness detection method according to claim 1, it is characterised in that: in step s 2, training During network model, to the notable figure of output layer output, conspicuousness recurrence is carried out by function Euclidean loss Analysis.