CN110175986A

CN110175986A - A kind of stereo-picture vision significance detection method based on convolutional neural networks

Info

Publication number: CN110175986A
Application number: CN201910327556.4A
Authority: CN
Inventors: 周武杰; 吕营; 雷景生; 张伟; 何成; 王海江
Original assignee: Zhejiang Lover Health Science and Technology Development Co Ltd
Current assignee: Zhejiang Lover Health Science and Technology Development Co Ltd; Zhejiang University of Science and Technology ZUST
Priority date: 2019-04-23
Filing date: 2019-04-23
Publication date: 2019-08-27
Anticipated expiration: 2039-04-23
Also published as: CN110175986B

Abstract

The invention discloses a kind of stereo-picture vision significance detection method based on convolutional neural networks, it constructs convolutional neural networks, include input layer, hidden layer, output layer, input layer includes RGB figure input layer and depth map input layer, hidden layer includes coding framework and decoding frame, and coding framework is made of RGB feature extraction module, depth characteristic extraction module and Fusion Features module；The left view point image of every width stereo-picture in training set and depth image are input in convolutional neural networks and are trained, the Saliency maps picture of every width stereo-picture in training set is obtained；The loss function value between the Saliency maps picture and true human eye gazing at images of every width stereo-picture in training set is calculated, obtains convolutional neural networks training pattern after repeating repeatedly；The left view point image and depth image of stereo-picture to be tested are input in convolutional neural networks training pattern, and prediction obtains conspicuousness forecast image；Advantage is its vision significance detection accuracy with higher.

Description

A kind of stereo-picture vision significance detection method based on convolutional neural networks

Technical field

The present invention relates to a kind of vision significance detection techniques, more particularly, to a kind of solid based on convolutional neural networks Image vision conspicuousness detection method.

Background technique

Vision significance is the popular research of the multiple fields such as Neuscience, robot technology, computer vision in recent years Project.Research about vision significance detection can be divided into two major classes: eyeball fixes prediction and conspicuousness target detection.The former It is several blinkpunkts for predicting people when watching natural scene, the latter is accurately to extract interested object.In general, vision Conspicuousness detection algorithm can be divided into top-down and bottom-up two class.Top-down method is task-driven, is needed Supervised learning.And bottom-up method is usually using low layer clue, as color characteristic, distance feature and heuristic conspicuousness are special Sign.Most common heuristic significant characteristics first is that contrast, such as based on pixel or block-based contrast.In the past to view Feel that the research of conspicuousness detection has focused largely on two dimensional image.However, it was found that firstly, three-dimensional data replaces 2-D data more suitable Close practical application；Secondly, extracting object outstanding merely with 2-D data is not as visual scene becomes to become increasingly complex No more.In recent years, with the three-dimensional datas acquiring technology such as Time-of-Flight sensor and Microsoft Kinect Progress, pushed the use of structure FInite Element, improved the recognition capability between the similar different objects of appearance.Depth number It is unrelated with light according to being easy to capture, geometry clue can also be provided, vision significance prediction is improved.Due to RGB data and depth The complementarity of data proposes the method that RGB image and depth image pair-wise combination are largely used for vision significance detection.It Preceding work is concentrated mainly on using the specific priori knowledge in field the significant characteristics for constructing low level, such as mankind's tendency In being more concerned about closer object, however this observation is difficult to be generalized to all scenes.In previous most of work, multimode It is all by the channel direct sequence RGB-D solve or every kind of mode of independent process that state, which merges problem, then in conjunction with two The decision of kind mode.Although these strategies achieve very big improvement, they are difficult sufficiently to explore across pattern complementary.In recent years Come, as convolutional neural networks (CNNs) are in the success of study RGB data differentiation characteristic aspect, more and more work are utilized CNNs explores more powerful effective multi-modal combined RGB-D and indicates.These work are mostly based on the architecture of two streams, wherein RGB data and depth data learn in an independent bottom-up stream, and are joined in early stage or later period binding characteristic Close reasoning.As most popular solution, double-current framework is realized than the work based on manual RGB-D characteristic significantly to be changed Into however, there are the most critical issues: how effectively to utilize the multi-modal complementary information during bottom-up.Therefore, having must RGB-D image vision conspicuousness detection technique is further studied, to improve the accuracy of vision significance detection.

Summary of the invention

Technical problem to be solved by the invention is to provide a kind of, and the stereo-picture vision based on convolutional neural networks is significant Property detection method, vision significance detection accuracy with higher.

The technical scheme of the invention to solve the technical problem is: a kind of perspective view based on convolutional neural networks As vision significance detection method, it is characterised in that including two processes of training stage and test phase；

The specific steps of the training stage process are as follows:

Step 1_1: the original stereo-picture that N breadth degree is W and height is H is chosen；Then by all original of selection Stereo-picture and the respective left view point image of all original stereo-pictures, depth image and true human eye gazing at images constitute N-th original stereo-picture in training set is denoted as { I by training setⁿ(x, y) }, by { Iⁿ(x, y) } left view point image, depth Degree image and true human eye gazing at images correspondence are denoted as{Dⁿ(x,y)}、Wherein, N is positive whole Number, N >=300, W and H can be divided exactly by 2, and n is positive integer, and the initial value of n is 1,1≤n≤N, 1≤x≤W, 1≤y≤H, Iⁿ (x, y) indicates { Iⁿ(x, y) } in coordinate position be (x, y) pixel pixel value,It indicatesIn Coordinate position is the pixel value of the pixel of (x, y), Dⁿ(x, y) indicates { Dⁿ(x, y) } in coordinate position be (x, y) pixel Pixel value,It indicatesMiddle coordinate position is the pixel value of the pixel of (x, y)；

Step 1_2: building convolutional neural networks: the convolutional neural networks include input layer, hidden layer, output layer, input layer Including RGB figure input layer and depth map input layer, hidden layer includes coding framework and decoding frame, and coding framework is mentioned by RGB feature Modulus block, depth characteristic extraction module and Fusion Features module three parts composition, RGB feature extraction module is by the 1st to the 4th Neural network block, the 1st to the 3rd down-sampling block composition, depth characteristic extraction module by the 5th to the 8th neural network block, 4th to the 6th down-sampling block composition, Fusion Features module by the 9th to the 15th neural network block, the 1st to the 4th most Great Chiization layer composition, decoding frame are made of the 16th to the 19th neural network block, the 1st to the 4th up-sampling layer；Output Layer is made of the first convolutional layer, first normalization layer and the first active coating, and the convolution kernel size of the first convolutional layer is 3 × 3, walks Width size is 1, convolution kernel number is 1, is filled with 1, and the active mode of the first active coating is " Sigmoid "；

For RGB figure input layer, input terminal receives width training left view point image, and output end output is trained left Visual point image is to hidden layer；Wherein, it is desirable that training is W with the width of left view point image and height is H；

For depth map input layer, input terminal receives the received training left view point diagram of input terminal of RGB figure input layer As corresponding trained depth image, output end exports trained depth image to hidden layer；Wherein, trained depth image Width is W and height is H；

For RGB feature extraction module, the output end that the input terminal of the 1st neural network block receives RGB figure input layer is defeated The left view point image of training out, the output end of the 1st neural network block export the characteristic pattern that 64 breadth degree are W and height is H, The set that all characteristic patterns of output are constituted is denoted as P₁；The input terminal of 1st down-sampling block receives P₁In all characteristic patterns, The output end of 1st down-sampling block exports 64 breadth degreeAnd height isCharacteristic pattern, by all characteristic patterns of output The set of composition is denoted as X₁；The input terminal of 2nd neural network block receives X₁In all characteristic patterns, the 2nd neural network block Output end exports 128 breadth degreeAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P₂；The input terminal of 2nd down-sampling block receives P₂In all characteristic patterns, the output end of the 2nd down-sampling block exports 128 breadth Degree isAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as X₂；3rd neural network block Input terminal receive X₂In all characteristic patterns, the output end of the 3rd neural network block exports 256 breadth degree and isAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P₃；The input terminal of 3rd down-sampling block receives P₃In All characteristic patterns, the output end of the 3rd down-sampling block export 256 breadth degree and areAnd height isCharacteristic pattern, by output The set that all characteristic patterns are constituted is denoted as X₃；The input terminal of 4th neural network block receives X₃In all characteristic patterns, the 4th mind Output end through network block exports 512 breadth degreeAnd height isCharacteristic pattern, all characteristic patterns of output are constituted Set is denoted as P₄；

For depth characteristic extraction module, the input terminal of the 5th neural network block receives the output end of depth map input layer The training depth image of output, the output end of the 5th neural network block export the characteristic pattern that 64 breadth degree are W and height is H, The set that all characteristic patterns of output are constituted is denoted as P₅；The input terminal of 4th down-sampling block receives P₅In all characteristic patterns, The output end of 4th down-sampling block exports 64 breadth degreeAnd height isCharacteristic pattern, by all characteristic pattern structures of output At set be denoted as X₄；The input terminal of 6th neural network block receives X₄In all characteristic patterns, the 6th neural network block it is defeated Outlet exports 128 breadth degreeAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P₆； The input terminal of 5th down-sampling block receives P₆In all characteristic patterns, the output end of the 5th down-sampling block exports 128 breadth degree ForAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as X₅；7th neural network block Input terminal receives X₅In all characteristic patterns, the output end of the 7th neural network block exports 256 breadth degree and isAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P₇；The input terminal of 6th down-sampling block receives P₇In All characteristic patterns, the output end of the 6th down-sampling block export 256 breadth degree and areAnd height isCharacteristic pattern, will export All characteristic patterns constitute set be denoted as X₆；The input terminal of 8th neural network block receives X₆In all characteristic patterns, the 8th The output end of neural network block exports 512 breadth degreeAnd height isCharacteristic pattern, all characteristic patterns of output are constituted Set be denoted as P₈；

For Fusion Features module, the input terminal of the 9th neural network block receives the output end output of RGB figure input layer Training left view point image, the output end of the 9th neural network block export the characteristic pattern that 3 breadth degree are W and height is H, will be defeated The set that all characteristic patterns out are constituted is denoted as P₉；The input terminal of 10th neural network block receives the output of depth map input layer The training depth image of output is held, the output end of the 10th neural network block exports the feature that 3 breadth degree are W and height is H The set that all characteristic patterns of output are constituted is denoted as P by figure₁₀；To P₉In all characteristic patterns and P₁₀In all characteristic patterns into Row Element-wise Summation operation, it is W and height that 3 breadth degree are exported after Element-wise Summation operation For the characteristic pattern of H, the set that all characteristic patterns of output are constituted is denoted as E₁；The input terminal of 11st neural network block receives E₁ In all characteristic patterns, the output end of the 11st neural network block exports the characteristic pattern that 64 breadth degree are W and height is H, will be defeated The set that all characteristic patterns out are constituted is denoted as P₁₁；To P₁In all characteristic patterns, P₅In all characteristic patterns and P₁₁In institute There is characteristic pattern to carry out Element-wise Summation operation, exports 64 breadth after Element-wise Summation operation The set that all characteristic patterns of output are constituted is denoted as E by the characteristic pattern that degree is W and height is H₂；1st maximum pond layer it is defeated Enter end and receives E₂In all characteristic patterns, the output end 64 breadth degree of output of the 1st maximum pond layer areAnd height is's The set that all characteristic patterns of output are constituted is denoted as Z by characteristic pattern₁；The input terminal of 12nd neural network block receives Z₁In institute There is characteristic pattern, the output end of the 12nd neural network block exports 128 breadth degree and isAnd height isCharacteristic pattern, will export All characteristic patterns constitute set be denoted as P₁₂；To P₂In all characteristic patterns, P₆In all characteristic patterns and P₁₂In it is all Characteristic pattern carries out Element-wise Summation operation, exports 128 breadth after Element-wise Summation operation Degree isAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as E₃；2nd maximum pond layer Input terminal receive E₃In all characteristic patterns, the output end 128 breadth degree of output of the 2nd maximum pond layer areAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as Z₂；The input terminal of 13rd neural network block receives Z₂ In all characteristic patterns, the output end of the 13rd neural network block exports 256 breadth degree and isAnd height isCharacteristic pattern, The set that all characteristic patterns of output are constituted is denoted as P₁₃；To P₃In all characteristic patterns, P₇In all characteristic patterns and P₁₃In All characteristic patterns carry out Element-wise Summation operation, Element-wise Summation operation after export 256 breadth degree areAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as E₄；3rd maximum The input terminal of pond layer receives E₄In all characteristic patterns, the output end 256 breadth degree of output of the 3rd maximum pond layer areAnd Highly it isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as Z₃；The input terminal of 14th neural network block Receive Z₃In all characteristic patterns, the output end of the 14th neural network block exports 512 breadth degree and isAnd height isSpy The set that all characteristic patterns of output are constituted is denoted as P by sign figure₁₄；To P₄In all characteristic patterns, P₈In all characteristic patterns and P₁₄In all characteristic patterns carry out Element-wise Summation operation, Element-wise Summation operation after it is defeated 512 breadth degree are outAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as E₅；4th most The input terminal of great Chiization layer receives E₅In all characteristic patterns, the output end 512 breadth degree of output of the 4th maximum pond layer are And height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as Z₄；The input of 15th neural network block End receives Z₄In all characteristic patterns, the output end of the 15th neural network block exports 1024 breadth degree and isAnd height is Characteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P₁₅；

For decoding frame, the input terminal of the 1st up-sampling layer receives P₁₅In all characteristic patterns, the 1st up-sampling layer Output end export 1024 breadth degree beAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted remembers For S₁；The input terminal of 16th neural network block receives S₁In all characteristic patterns, the output end output of the 16th neural network block 256 breadth degree areAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P₁₆；On 2nd The input terminal of sample level receives P₁₆In all characteristic patterns, the 2nd up-sampling layer output end export 256 breadth degree beAnd Highly it isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as S₂；The input terminal of 17th neural network block Receive S₂In all characteristic patterns, the output end of the 17th neural network block exports 128 breadth degree and isAnd height isSpy The set that all characteristic patterns of output are constituted is denoted as P by sign figure₁₇；The input terminal of 3rd up-sampling layer receives P₁₇In it is all Characteristic pattern, the output end of the 3rd up-sampling layer export 128 breadth degree and areAnd height isCharacteristic pattern, by the institute of output The set for having characteristic pattern to constitute is denoted as S₃；The input terminal of 18th neural network block receives S₃In all characteristic patterns, the 18th mind Output end through network block exports 64 breadth degreeAnd height isCharacteristic pattern, all characteristic patterns of output are constituted Set is denoted as P₁₈；The input terminal of 4th up-sampling layer receives P₁₈In all characteristic patterns, the 4th up-sampling layer output end it is defeated The set that all characteristic patterns of output are constituted is denoted as S by the characteristic pattern that 64 breadth degree are W and height is H out₄；19th nerve The input terminal of network block receives S₄In all characteristic patterns, the output end of the 19th neural network block export 64 breadth degree be W and Height is the characteristic pattern of H, and the set that all characteristic patterns of output are constituted is denoted as P₁₉；

For output layer, the input terminal of the first convolutional layer receives P₁₉In all characteristic patterns, the output end of the first convolutional layer Export the characteristic pattern that a breadth degree is W and height is H；The input terminal of first normalization layer receives the output end of the first convolutional layer The characteristic pattern of output；The input terminal of first active coating receives the characteristic pattern of the output end output of first normalization layer；First swashs The output end of layer living exports the Saliency maps picture that the corresponding stereo-picture of left view point image is used in width training；Wherein, Saliency maps The width of picture is W and height is H；

Step 1_3: using the left view point image of the original stereo-picture of every in training set as training left view point diagram Picture, and using the depth image of the original stereo-picture of every in training set as trained depth image, it is input to convolution mind It is trained in network, the Saliency maps picture of every original stereo-picture in training set is obtained, by { Iⁿ(x, y) } it is aobvious Work property image is denoted asWherein,It indicatesMiddle coordinate position is the pixel of (x, y) Pixel value；

Step 1_4: the Saliency maps picture and true human eye for calculating every original stereo-picture in training set watch figure attentively Loss function value as between, willWithBetween loss function value be denoted asUsing mean square error loss function It obtains；

Step 1_5: repeating step 1_3 and step 1_4 is V times total, obtains convolutional neural networks training pattern, and there are To N × V loss function value；Then the smallest loss function value of value is found out from N × V loss function value；Then will be worth most The small corresponding weighted vector of loss function value and bias term swears the best initial weights that should be used as convolutional neural networks training pattern Amount and optimal bias term, correspondence are denoted as W^bestAnd b^best；Wherein, V > 1；

The specific steps of the test phase process are as follows:

Step 2_1: it enablesIndicate the stereo-picture that width to be tested is W' and height is H', it willLeft view point image and depth image correspondence be denoted asWithWherein, 1≤x'≤ W', 1≤y'≤H',It indicatesMiddle coordinate position is the pixel value of the pixel of (x', y'),It indicatesMiddle coordinate position is the pixel value of the pixel of (x', y'),It indicatesMiddle coordinate position is the pixel value of the pixel of (x', y')；

Step 2_2: willWithIt is input in convolutional neural networks training pattern, and utilizes W^bestAnd b^bestIt is predicted, is obtainedConspicuousness forecast image, be denoted asWherein,It indicatesMiddle coordinate position is the pixel value of the pixel of (x', y').

In the step 1_2, the structure of the 1st to the 8th neural network block is identical, empty by set gradually first Hole convolutional layer, second batch normalization layer, the second active coating, the first residual block, the second empty convolutional layer, third batch normalization layer structure At the input terminal of the first empty convolutional layer is the input terminal of the neural network block where it, the input terminal of second batch normalization layer All characteristic patterns of the output end output of the first empty convolutional layer are received, the input terminal of the second active coating receives second batch standardization All characteristic patterns of the output end output of layer, the input terminal of the first residual block receive all of the output end output of the second active coating Characteristic pattern, the input terminal of the second empty convolutional layer receive all characteristic patterns of the output end output of the first residual block, third batch mark The input terminal of standardization layer receives all characteristic patterns of the output end output of the second empty convolutional layer, the output of third batch normalization layer The output end of neural network block of the end where it；Wherein, the 1st and the 5th neural network block respectively in the first cavity volume The convolution kernel number of lamination and the second empty convolutional layer be the 64, the 2nd and the 6th neural network block respectively in the first cavity roll up The convolution kernel number of lamination and the second empty convolutional layer be the 128, the 3rd and the 7th neural network block respectively in it is first empty The convolution kernel number of convolutional layer and the second empty convolutional layer be the 256, the 4th and the 8th neural network block respectively in the first sky The convolution kernel number of hole convolutional layer and the second empty convolutional layer be the 512, the 1st to the 8th neural network block respectively in first It is 1, cavity be 2, filling is 2 that the convolution kernel size of empty convolutional layer and the second empty convolutional layer, which is 3 × 3, stride, 1st to the 8th neural network block respectively in the active mode of the second active coating be " ReLU "；

9th identical with the 10th structure of neural network block, is marked by the second convolutional layer set gradually and the 4th batch Standardization layer is constituted, and the input terminal of the second convolutional layer is the input terminal of the neural network block where it, the 4th batch of normalization layer it is defeated Enter all characteristic patterns that end receives the output end output of the second convolutional layer, the output end of the 4th batch of normalization layer is the mind where it Output end through network block；Wherein, the 9th and the 10th neural network block respectively in the second convolutional layer convolution kernel number it is equal For 3, convolution kernel size be 7 × 7, stride be 1, filling is 3；

11st identical with the 12nd structure of neural network block, by third convolutional layer, the 5th batch of mark set gradually Standardization layer, third active coating, Volume Four lamination, the 6th batch of normalization layer are constituted, and the input terminal of third convolutional layer is where it The input terminal of neural network block, the input terminal of the 5th batch of normalization layer receive all features of the output end output of third convolutional layer Figure, the input terminal of third active coating receive all characteristic patterns of the output end output of the 5th batch of normalization layer, third active coating Input terminal receives all characteristic patterns of the output end output of the 5th batch of normalization layer, and the input terminal of Volume Four lamination receives third and swashs All characteristic patterns of the output end output of layer living, the input terminal of the 6th batch of normalization layer receive the output end output of Volume Four lamination All characteristic patterns, the output end of the 6th batch of normalization layer is the output end of the neural network block where it；Wherein, the 11st mind Convolution kernel number through third convolutional layer and Volume Four lamination in network block is the third volume in the 64, the 12nd neural network block The convolution kernel number of lamination and Volume Four lamination is 128, the 11st and the 12nd neural network block respectively in third convolutional layer Convolution kernel size with Volume Four lamination be 3 × 3, stride be 1, filling is 1；11st and the 12nd neural network block The active mode of third active coating in respectively is " ReLU "；

The structure of 13rd to the 19th neural network block is identical, by the 5th convolutional layer, the 7th batch of mark set gradually Standardization layer, the 4th active coating, the 6th convolutional layer, the 8th batch of normalization layer, the 5th active coating, the 7th convolutional layer, the 9th batch of standard Change layer to constitute, the input terminal of the 5th convolutional layer is the input terminal of the neural network block where it, the input of the 7th batch of normalization layer End receives all characteristic patterns of the output end output of the 5th convolutional layer, and the input terminal of the 4th active coating receives the 7th batch of normalization layer Output end output all characteristic patterns, the input terminal of the 6th convolutional layer receives all spies of the output end output of the 4th active coating Sign figure, the input terminal of the 8th batch of normalization layer receive all characteristic patterns of the output end output of the 6th convolutional layer, the 5th active coating Input terminal receive the 8th batch of normalization layer output end output all characteristic patterns, the input terminal of the 7th convolutional layer receives the 5th All characteristic patterns of the output end output of active coating, the output end that the input terminal of the 9th batch of normalization layer receives the 7th convolutional layer are defeated All characteristic patterns out, the output end of the 9th batch of normalization layer are the output end of the neural network block where it；Wherein, the 13rd The convolution kernel number of the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer in neural network block is the 256, the 14th nerve net The convolution kernel number of the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer in network block is in the 512, the 15th neural network block The 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer convolution kernel number be in the 1024, the 16th neural network block the The convolution kernel number of five convolutional layers, the 6th convolutional layer and the 7th convolutional layer corresponds to the 512,512,256, the 17th neural network block In the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer convolution kernel number correspond to the 256,256,128, the 18th nerve The convolution kernel number of the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer in network block corresponds to the 128,128,64, the 19th The convolution kernel number of the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer in a neural network block is the 64, the 13rd to the 19 neural network blocks respectively in the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer convolution kernel size be 3 × 3, Stride is 1, filling is 1, the 13rd to the 19th neural network block respectively in the 4th active coating and the 5th active coating Active mode is " ReLU ".

In the step 1_2, the structure of the 1st to the 6th down-sampling block is identical, is made of the second residual block, the The input terminal of two residual blocks is the input terminal of the down-sampling block where it, and the output end of the second residual block is the down-sampling where it The output end of block.

First residual block is identical with the structure of second residual block comprising 3 convolutional layers, 3 batches of marks Standardization layer and 3 active coatings, the input terminal of the 1st convolutional layer are the input terminal of the residual block where it, the 1st batch of normalization layer Input terminal receive the 1st convolutional layer output end output all characteristic patterns, the input terminal of the 1st active coating receives the 1st All characteristic patterns of the output end output of normalization layer are criticized, the input terminal of the 2nd convolutional layer receives the output end of the 1st active coating All characteristic patterns of output, the input terminal of the 2nd batch of normalization layer receive all features of the output end output of the 2nd convolutional layer Figure, the input terminal of the 2nd active coating receive all characteristic patterns of the output end output of the 2nd batch of normalization layer, the 3rd convolutional layer Input terminal receive the 2nd active coating output end output all characteristic patterns, the input terminal of the 3rd batch normalization layer receives the All characteristic patterns of the output end output of 3 convolutional layers, the received all characteristic patterns of the input terminal of the 1st convolutional layer and the 3rd All characteristic patterns for criticizing the output end output of normalization layer are added, using the output end of the 3rd active coating after the 3rd active coating All characteristic patterns of the output end output of residual block of all characteristic patterns of output as where；Wherein, the 1st and the 5th mind Through network block respectively in the first residual block in each convolutional layer convolution kernel number be the 64, the 2nd and the 6th neural network Block respectively in the first residual block in each convolutional layer convolution kernel number be the 128, the 3rd and the 7th neural network block it is each The convolution kernel number of each convolutional layer in the first residual block in be the 256, the 4th and the 8th neural network block respectively in The first residual block in each convolutional layer convolution kernel number be the 512, the 1st to the 8th neural network block respectively in the It is 1 that the convolution kernel size of the 1st convolutional layer and the 3rd convolutional layer in one residual block, which is 1 × 1, stride, and the 1st to the 8th A neural network block respectively in the first residual block in the 2nd convolutional layer convolution kernel size be 3 × 3, stride be 1, Filling is 1, the 1st and the 4th down-sampling block respectively in the second residual block in the convolution kernel number of each convolutional layer be 64, the 2nd and the 5th down-sampling block respectively in the second residual block in the convolution kernel number of each convolutional layer be the 128, the 3rd A and the 6th down-sampling block respectively in the second residual block in each convolutional layer convolution kernel number be the 256, the 1st to the 6th A down-sampling block respectively in the second residual block in the 1st convolutional layer and the 3rd convolutional layer convolution kernel size be 1 × 1, Stride is 1, the 1st to the 6th down-sampling block respectively in the second residual block in the 2nd convolutional layer convolution kernel size it is equal It is 2 for 3 × 3, stride, filling be the active modes of 1,3 active coatings is " ReLU ".

In the step 1_2, the size of the pond window of the 1st to the 4th maximum pond layer is that 2 × 2, stride is equal It is 2.

In the step 1_2, the sampling configuration of the 1st to the 4th up-sampling layer is bilinear interpolation, scale factor It is 2.

Compared with the prior art, the advantages of the present invention are as follows:

1) coding framework proposed in convolutional neural networks of the method for the present invention by building is to RGB image and depth image Module (i.e. RGB feature extraction module and depth characteristic extraction module) is respectively trained to learn the RGB and depth of different stage Feature is spent, and proposes the module of special fusion a RGB and depth characteristic, i.e. Fusion Features module, from rudimentary to advanced Both features are merged, is conducive to make full use of and forms new differentiation feature across modal information, improve stereoscopic vision conspicuousness The accuracy of prediction.

2) in the RGB feature extraction module and depth characteristic extraction module in the convolutional neural networks of the method for the present invention building Down-sampling block using stride by 2 residual block come replace in the past work used in maximum pond layer, it is adaptive to be conducive to model Ground selected characteristic information is answered, avoids and loses important information since maximum pondization operates.

3) in the RGB feature extraction module and depth characteristic extraction module in the convolutional neural networks of the method for the present invention building The residual block that front and back belt has empty convolutional layer is introduced, the acceptance region of convolution kernel is expanded, is conducive to the convolutional Neural net of building Network is more concerned about global information, acquires more abundant content.

Detailed description of the invention

Fig. 1 is the composition schematic diagram of the convolutional neural networks of the method for the present invention building.

Specific embodiment

The present invention will be described in further detail below with reference to the embodiments of the drawings.

A kind of stereo-picture vision significance detection method based on convolutional neural networks proposed by the present invention comprising instruction Practice two processes of stage and test phase.

The specific steps of the training stage process are as follows:

Step 1_1: the original stereo-picture that N breadth degree is W and height is H is chosen；Then by all original of selection Stereo-picture and the respective left view point image of all original stereo-pictures, depth image and true human eye gazing at images constitute N-th original stereo-picture in training set is denoted as { I by training setⁿ(x, y) }, by { Iⁿ(x, y) } left view point image, depth Degree image and true human eye gazing at images correspondence are denoted as{Dⁿ(x,y)}、Wherein, N is positive whole Number, N >=300 such as take N=600, W and H that can be divided exactly by 2, and n is positive integer, and the initial value of n is 1, and 1≤n≤N, 1≤x≤ W, 1≤y≤H, Iⁿ(x, y) indicates { Iⁿ(x, y) } in coordinate position be (x, y) pixel pixel value,It indicatesMiddle coordinate position is the pixel value of the pixel of (x, y), Dⁿ(x, y) indicates { Dⁿ(x, y) } in coordinate position be The pixel value of the pixel of (x, y),It indicatesMiddle coordinate position is the pixel of the pixel of (x, y) Value.

Step 1_2: building convolutional neural networks: as shown in Figure 1, the convolutional neural networks include input layer, hidden layer, output Layer, input layer include RGB figure input layer and depth map input layer, and hidden layer includes coding framework and decoding frame, coding framework by RGB feature extraction module, depth characteristic extraction module and Fusion Features module three parts composition, RGB feature extraction module is by the 1st A to the 4th neural network block, the 1st to the 3rd down-sampling block composition, depth characteristic extraction module is by the 5th to the 8th mind Through network block, the 4th to the 6th down-sampling block composition, Fusion Features module is by the 9th to the 15th neural network block, the 1st To the 4th maximum pond layer composition, frame is decoded by the 16th to the 19th neural network block, the 1st to the 4th up-sampling layer Composition；Output layer is made of the first convolutional layer, first normalization layer and the first active coating, the convolution kernel size of the first convolutional layer It is 1, is filled with 1 for 3 × 3, step size 1, convolution kernel number, the active mode of the first active coating is " Sigmoid ".

For RGB figure input layer, input terminal receives width training left view point image, and output end output is trained left Visual point image is to hidden layer；Wherein, it is desirable that training is W with the width of left view point image and height is H.

For depth map input layer, input terminal receives the received training left view point diagram of input terminal of RGB figure input layer As corresponding trained depth image, output end exports trained depth image to hidden layer；Wherein, trained depth image Width is W and height is H.

For RGB feature extraction module, the output end that the input terminal of the 1st neural network block receives RGB figure input layer is defeated The left view point image of training out, the output end of the 1st neural network block export the characteristic pattern that 64 breadth degree are W and height is H, The set that all characteristic patterns of output are constituted is denoted as P₁；The input terminal of 1st down-sampling block receives P₁In all characteristic patterns, The output end of 1st down-sampling block exports 64 breadth degreeAnd height isCharacteristic pattern, by all characteristic pattern structures of output At set be denoted as X₁；The input terminal of 2nd neural network block receives X₁In all characteristic patterns, the 2nd neural network block it is defeated Outlet exports 128 breadth degreeAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P₂； The input terminal of 2nd down-sampling block receives P₂In all characteristic patterns, the output end of the 2nd down-sampling block exports 128 breadth degree ForAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as X₂；3rd neural network block Input terminal receives X₂In all characteristic patterns, the output end of the 3rd neural network block exports 256 breadth degree and isAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P₃；The input terminal of 3rd down-sampling block receives P₃In All characteristic patterns, the output end of the 3rd down-sampling block export 256 breadth degree and areAnd height isCharacteristic pattern, by output The set that all characteristic patterns are constituted is denoted as X₃；The input terminal of 4th neural network block receives X₃In all characteristic patterns, the 4th mind Output end through network block exports 512 breadth degreeAnd height isCharacteristic pattern, all characteristic patterns of output are constituted Set is denoted as P₄。

For depth characteristic extraction module, the input terminal of the 5th neural network block receives the output end of depth map input layer The training depth image of output, the output end of the 5th neural network block export the characteristic pattern that 64 breadth degree are W and height is H, The set that all characteristic patterns of output are constituted is denoted as P₅；The input terminal of 4th down-sampling block receives P₅In all characteristic patterns, The output end of 4th down-sampling block exports 64 breadth degreeAnd height isCharacteristic pattern, by all characteristic pattern structures of output At set be denoted as X₄；The input terminal of 6th neural network block receives X₄In all characteristic patterns, the 6th neural network block it is defeated Outlet exports 128 breadth degreeAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P₆； The input terminal of 5th down-sampling block receives P₆In all characteristic patterns, the output end of the 5th down-sampling block exports 128 breadth degree ForAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as X₅；7th neural network block Input terminal receives X₅In all characteristic patterns, the output end of the 7th neural network block exports 256 breadth degree and isAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P₇；The input terminal of 6th down-sampling block receives P₇In All characteristic patterns, the output end of the 6th down-sampling block export 256 breadth degree and areAnd height isCharacteristic pattern, by output The set that all characteristic patterns are constituted is denoted as X₆；The input terminal of 8th neural network block receives X₆In all characteristic patterns, the 8th mind Output end through network block exports 512 breadth degreeAnd height isCharacteristic pattern, all characteristic patterns of output are constituted Set is denoted as P₈。

For Fusion Features module, the input terminal of the 9th neural network block receives the output end output of RGB figure input layer Training left view point image, the output end of the 9th neural network block export the characteristic pattern that 3 breadth degree are W and height is H, will be defeated The set that all characteristic patterns out are constituted is denoted as P₉；The input terminal of 10th neural network block receives the output of depth map input layer The training depth image of output is held, the output end of the 10th neural network block exports the feature that 3 breadth degree are W and height is H The set that all characteristic patterns of output are constituted is denoted as P by figure₁₀；To P₉In all characteristic patterns and P₁₀In all characteristic patterns into Row Element-wise Summation operation, it is W and height that 3 breadth degree are exported after Element-wise Summation operation For the characteristic pattern of H, the set that all characteristic patterns of output are constituted is denoted as E₁；The input terminal of 11st neural network block receives E₁ In all characteristic patterns, the output end of the 11st neural network block exports the characteristic pattern that 64 breadth degree are W and height is H, will be defeated The set that all characteristic patterns out are constituted is denoted as P₁₁；To P₁In all characteristic patterns, P₅In all characteristic patterns and P₁₁In institute There is characteristic pattern to carry out Element-wise Summation operation, exports 64 breadth after Element-wise Summation operation The set that all characteristic patterns of output are constituted is denoted as E by the characteristic pattern that degree is W and height is H₂；1st maximum pond layer it is defeated Enter end and receives E₂In all characteristic patterns, the output end 64 breadth degree of output of the 1st maximum pond layer areAnd height is's The set that all characteristic patterns of output are constituted is denoted as Z by characteristic pattern₁；The input terminal of 12nd neural network block receives Z₁In institute There is characteristic pattern, the output end of the 12nd neural network block exports 128 breadth degree and isAnd height isCharacteristic pattern, will export All characteristic patterns constitute set be denoted as P₁₂；To P₂In all characteristic patterns, P₆In all characteristic patterns and P₁₂In it is all Characteristic pattern carries out Element-wise Summation operation, exports 128 breadth after Element-wise Summation operation Degree isAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as E₃；2nd maximum pond layer Input terminal receive E₃In all characteristic patterns, the output end 128 breadth degree of output of the 2nd maximum pond layer areAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as Z₂；The input terminal of 13rd neural network block receives Z₂ In all characteristic patterns, the output end of the 13rd neural network block exports 256 breadth degree and isAnd height isCharacteristic pattern, The set that all characteristic patterns of output are constituted is denoted as P₁₃；To P₃In all characteristic patterns, P₇In all characteristic patterns and P₁₃In All characteristic patterns carry out Element-wise Summation operation, Element-wise Summation operation after export 256 breadth degree areAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as E₄；3rd maximum The input terminal of pond layer receives E₄In all characteristic patterns, the output end 256 breadth degree of output of the 3rd maximum pond layer areAnd Highly it isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as Z₃；The input terminal of 14th neural network block Receive Z₃In all characteristic patterns, the output end of the 14th neural network block exports 512 breadth degree and isAnd height isSpy The set that all characteristic patterns of output are constituted is denoted as P by sign figure₁₄；To P₄In all characteristic patterns, P₈In all characteristic patterns and P₁₄In all characteristic patterns carry out Element-wise Summation operation, Element-wise Summation operation after it is defeated 512 breadth degree are outAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as E₅；4th most The input terminal of great Chiization layer receives E₅In all characteristic patterns, the output end 512 breadth degree of output of the 4th maximum pond layer areAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as Z₄；15th neural network block it is defeated Enter end and receives Z₄In all characteristic patterns, the output end of the 15th neural network block exports 1024 breadth degree and isAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P₁₅。

For decoding frame, the input terminal of the 1st up-sampling layer receives P₁₅In all characteristic patterns, the 1st up-sampling layer Output end export 1024 breadth degree beAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted remembers For S₁；The input terminal of 16th neural network block receives S₁In all characteristic patterns, the output end output of the 16th neural network block 256 breadth degree areAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P₁₆；On 2nd The input terminal of sample level receives P₁₆In all characteristic patterns, the 2nd up-sampling layer output end export 256 breadth degree beAnd Highly it isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as S₂；The input terminal of 17th neural network block Receive S₂In all characteristic patterns, the output end of the 17th neural network block exports 128 breadth degree and isAnd height isSpy The set that all characteristic patterns of output are constituted is denoted as P by sign figure₁₇；The input terminal of 3rd up-sampling layer receives P₁₇In it is all Characteristic pattern, the output end of the 3rd up-sampling layer export 128 breadth degree and areAnd height isCharacteristic pattern, by the institute of output The set for having characteristic pattern to constitute is denoted as S₃；The input terminal of 18th neural network block receives S₃In all characteristic patterns, the 18th mind Output end through network block exports 64 breadth degreeAnd height isCharacteristic pattern, all characteristic patterns of output are constituted Set is denoted as P₁₈；The input terminal of 4th up-sampling layer receives P₁₈In all characteristic patterns, the 4th up-sampling layer output end it is defeated The set that all characteristic patterns of output are constituted is denoted as S by the characteristic pattern that 64 breadth degree are W and height is H out₄；19th nerve The input terminal of network block receives S₄In all characteristic patterns, the output end of the 19th neural network block export 64 breadth degree be W and Height is the characteristic pattern of H, and the set that all characteristic patterns of output are constituted is denoted as P₁₉。

For output layer, the input terminal of the first convolutional layer receives P₁₉In all characteristic patterns, the output end of the first convolutional layer Export the characteristic pattern that a breadth degree is W and height is H；The input terminal of first normalization layer receives the output end of the first convolutional layer The characteristic pattern of output；The input terminal of first active coating receives the characteristic pattern of the output end output of first normalization layer；First swashs The output end of layer living exports the Saliency maps picture that the corresponding stereo-picture of left view point image is used in width training；Wherein, Saliency maps The width of picture is W and height is H.

Step 1_3: using the left view point image of the original stereo-picture of every in training set as training left view point diagram Picture, and using the depth image of the original stereo-picture of every in training set as trained depth image, it is input to convolution mind It is trained in network, the Saliency maps picture of every original stereo-picture in training set is obtained, by { Iⁿ(x, y) } it is aobvious Work property image is denoted asWherein,It indicatesMiddle coordinate position is the pixel of (x, y) Pixel value.

Step 1_4: the Saliency maps picture and true human eye for calculating every original stereo-picture in training set watch figure attentively Loss function value as between, willWithBetween loss function value be denoted asUsing mean square error loss function It obtains.

Step 1_5: repeating step 1_3 and step 1_4 is V times total, obtains convolutional neural networks training pattern, and there are To N × V loss function value；Then the smallest loss function value of value is found out from N × V loss function value；Then will be worth most The small corresponding weighted vector of loss function value and bias term swears the best initial weights that should be used as convolutional neural networks training pattern Amount and optimal bias term, correspondence are denoted as W^bestAnd b^best；Wherein, V > 1, such as takes V=50.

The specific steps of the test phase process are as follows:

Step 2_1: it enablesIndicate the stereo-picture that width to be tested is W' and height is H', it willLeft view point image and depth image correspondence be denoted asWithWherein, 1≤x'≤ W', 1≤y'≤H',It indicatesMiddle coordinate position is the pixel value of the pixel of (x', y'),It indicatesMiddle coordinate position is the pixel value of the pixel of (x', y'),It indicatesMiddle coordinate position is the pixel value of the pixel of (x', y').

In this particular embodiment, in step 1_2, the structure of the 1st to the 8th neural network block is identical, by successively The empty convolutional layer of first be arranged, second batch normalization layer, the second active coating, the first residual block, the second empty convolutional layer, third It criticizes normalization layer to constitute, the input terminal of the first empty convolutional layer is the input terminal of the neural network block where it, second batch standard The input terminal for changing layer receives all characteristic patterns that the output end of the first empty convolutional layer exports, and the input terminal of the second active coating receives All characteristic patterns of the output end output of second batch normalization layer, the input terminal of the first residual block receive the output of the second active coating All characteristic patterns of output are held, the input terminal of the second empty convolutional layer receives all features of the output end output of the first residual block Figure, the input terminal of third batch normalization layer receive all characteristic patterns of the output end output of the second empty convolutional layer, third batch mark The output end of standardization layer is the output end of the neural network block where it；Wherein, the 1st and the 5th neural network block respectively in The first empty convolutional layer and the second empty convolutional layer convolution kernel number be the 64, the 2nd and the 6th neural network block respectively in The first empty convolutional layer and the second empty convolutional layer convolution kernel number be the 128, the 3rd and the 7th neural network block respectively In the first empty convolutional layer and the convolution kernel number of the second empty convolutional layer be that the 256, the 4th and the 8th neural network block are each The convolution kernel number of the first empty convolutional layer and the second empty convolutional layer in is the 512, the 1st to the 8th neural network block It is that 1, cavity is that the convolution kernel size of the first empty convolutional layer and the second empty convolutional layer in respectively, which is 3 × 3, stride, 2, filling is 2, the 1st to the 8th neural network block respectively in the active mode of the second active coating be " ReLU ".

9th identical with the 10th structure of neural network block, is marked by the second convolutional layer set gradually and the 4th batch Standardization layer is constituted, and the input terminal of the second convolutional layer is the input terminal of the neural network block where it, the 4th batch of normalization layer it is defeated Enter all characteristic patterns that end receives the output end output of the second convolutional layer, the output end of the 4th batch of normalization layer is the mind where it Output end through network block；Wherein, the 9th and the 10th neural network block respectively in the second convolutional layer convolution kernel number it is equal For 3, convolution kernel size be 7 × 7, stride be 1, filling is 3.

11st identical with the 12nd structure of neural network block, by third convolutional layer, the 5th batch of mark set gradually Standardization layer, third active coating, Volume Four lamination, the 6th batch of normalization layer are constituted, and the input terminal of third convolutional layer is where it The input terminal of neural network block, the input terminal of the 5th batch of normalization layer receive all features of the output end output of third convolutional layer Figure, the input terminal of third active coating receive all characteristic patterns of the output end output of the 5th batch of normalization layer, third active coating Input terminal receives all characteristic patterns of the output end output of the 5th batch of normalization layer, and the input terminal of Volume Four lamination receives third and swashs All characteristic patterns of the output end output of layer living, the input terminal of the 6th batch of normalization layer receive the output end output of Volume Four lamination All characteristic patterns, the output end of the 6th batch of normalization layer is the output end of the neural network block where it；Wherein, the 11st mind Convolution kernel number through third convolutional layer and Volume Four lamination in network block is the third volume in the 64, the 12nd neural network block The convolution kernel number of lamination and Volume Four lamination is 128, the 11st and the 12nd neural network block respectively in third convolutional layer Convolution kernel size with Volume Four lamination be 3 × 3, stride be 1, filling is 1；11st and the 12nd neural network block The active mode of third active coating in respectively is " ReLU ".

In this particular embodiment, in step 1_2, the structure of the 1st to the 6th down-sampling block is identical, residual by second Poor block is constituted, and the input terminal of the second residual block is the input terminal of the down-sampling block where it, and the output end of the second residual block is it The output end of the down-sampling block at place.

In this particular embodiment, the first residual block is identical with the structure of second residual block comprising 3 convolution Layer, 3 batches of normalization layers and 3 active coatings, the input terminal of residual block of the input terminal of the 1st convolutional layer where it, the 1st The input terminal for criticizing normalization layer receives all characteristic patterns that the output end of the 1st convolutional layer exports, the input terminal of the 1st active coating All characteristic patterns of the output end output of the 1st batch of normalization layer are received, the input terminal of the 2nd convolutional layer receives the 1st activation All characteristic patterns of the output end output of layer, the input terminal of the 2nd batch of normalization layer receive the output end output of the 2nd convolutional layer All characteristic patterns, the input terminal of the 2nd active coating receives all characteristic patterns of the output end output of the 2nd batch normalization layer, The input terminal of 3rd convolutional layer receives all characteristic patterns of the output end output of the 2nd active coating, and the 3rd is criticized normalization layer Input terminal receives all characteristic patterns of the output end output of the 3rd convolutional layer, the received all spies of the input terminal of the 1st convolutional layer Sign figure is added with all characteristic patterns of the output end of the 3rd batch of normalization layer output, is swashed using the 3rd after the 3rd active coating All characteristic patterns of the output end output of residual block of all characteristic patterns of the output end output of layer living as where；Wherein, the 1st A and the 5th neural network block respectively in the first residual block in the convolution kernel number of each convolutional layer be the 64, the 2nd and the 6 neural network blocks respectively in the first residual block in each convolutional layer convolution kernel number be the 128, the 3rd and the 7th mind Through network block respectively in the first residual block in each convolutional layer convolution kernel number be the 256, the 4th and the 8th nerve net Network block respectively in the first residual block in each convolutional layer convolution kernel number be the 512, the 1st to the 8th neural network block It is 1 that the convolution kernel size of the 1st convolutional layer and the 3rd convolutional layer in the first residual block in respectively, which is 1 × 1, stride, 1st to the 8th neural network block respectively in the first residual block in the 2nd convolutional layer convolution kernel size be 3 × 3, Stride is 1, filling is 1, the 1st and the 4th down-sampling block respectively in the second residual block in each convolutional layer volume Product core number be the 64, the 2nd and the 5th down-sampling block respectively in the second residual block in each convolutional layer convolution kernel number Be the 128, the 3rd and the 6th down-sampling block respectively in the second residual block in each convolutional layer convolution kernel number be 256, 1st to the 6th down-sampling block respectively in the second residual block in the 1st convolutional layer and the 3rd convolutional layer convolution kernel it is big Small be 1 × 1, stride be 1, the 1st to the 6th down-sampling block respectively in the second residual block in the 2nd convolutional layer Convolution kernel size be 3 × 3, stride be 2, filling be the active modes of 1,3 active coatings is " ReLU ".

In this particular embodiment, in step 1_2, the size of the pond window of the 1st to the 4th maximum pond layer is 2 × 2, stride is 2.

In in this particular embodiment, in step 1_2, the sampling configuration of the 1st to the 4th up-sampling layer is bilinearity Interpolation, scale factor are 2.

In order to verify the feasibility and validity of the method for the present invention, tested.

Here, using TaiWan, China university of communications provide three-dimensional tracing of human eye database (NCTU-3DFixation) come Analyze the Stability and veracity of the method for the present invention.Here, objective parameter is commonly used using 4 of the assessment significant extracting method of vision As evaluation index, i.e. linearly dependent coefficient (Linear Correlation Coefficient, CC), Kullback- Leibler divergence coefficient (Kullback-Leibler Divergence, KLD), AUC parameter (the Area Under the Receiver operating characteristics Curve, AUC), normalized scans path conspicuousness (Normalized Scanpath Saliency, NSS).

The every width obtained in the three-dimensional tracing of human eye database that TaiWan, China university of communications provides using the method for the present invention is vertical The conspicuousness forecast image of body image, and the subjective vision notable figure with every width stereo-picture in three-dimensional tracing of human eye database I.e. true human eye gazing at images (existing in three-dimensional tracing of human eye database) is compared, and CC, AUC and NSS value is higher, KLD value The lower conspicuousness forecast image for illustrating the method for the present invention acquisition and the consistency of subjective vision notable figure are better.The reflection present invention CC, KLD, AUC and NSS index of correlation of the significant extraction performance of method are as listed in table 1.

The Stability and veracity of conspicuousness forecast image and subjective vision notable figure that table 1 is obtained using the method for the present invention

Performance indicator	CC	KLD	AUC(Borji)	NSS
					Performance index value	0.7583	0.4868	0.8789	2.0692

The data listed by the table 1 are it is found that the conspicuousness forecast image and subjective vision notable figure obtained by the method for the present invention Stability and veracity be well, show that the result of objective testing result and human eye subjective perception is more consistent, it is sufficient to say The feasibility and validity of bright the method for the present invention.

Claims

1. a kind of stereo-picture vision significance detection method based on convolutional neural networks, it is characterised in that including the training stage With two processes of test phase；

The specific steps of the training stage process are as follows:

Step 1_1: the original stereo-picture that N breadth degree is W and height is H is chosen；Then all original of selection is stood Body image and the respective left view point image of all original stereo-pictures, depth image and true human eye gazing at images composing training Collection, is denoted as { I for n-th original stereo-picture in training setⁿ(x, y) }, by { Iⁿ(x, y) } left view point image, depth map Picture and true human eye gazing at images correspondence are denoted as{Dⁿ(x,y)}、Wherein, N is positive integer, N >=300, W and H can be divided exactly by 2, and n is positive integer, and the initial value of n is 1,1≤n≤N, 1≤x≤W, 1≤y≤H, Iⁿ(x, Y) { I is indicatedⁿ(x, y) } in coordinate position be (x, y) pixel pixel value,It indicatesMiddle seat Mark is set to the pixel value of the pixel of (x, y), Dⁿ(x, y) indicates { Dⁿ(x, y) } in coordinate position be (x, y) pixel Pixel value,It indicatesMiddle coordinate position is the pixel value of the pixel of (x, y)；

Step 1_2: building convolutional neural networks: the convolutional neural networks include input layer, hidden layer, output layer, and input layer includes RGB figure input layer and depth map input layer, hidden layer include coding framework and decoding frame, and coding framework extracts mould by RGB feature Block, depth characteristic extraction module and Fusion Features module three parts composition, RGB feature extraction module is by the 1st to the 4th nerve Network block, the 1st to the 3rd down-sampling block composition, depth characteristic extraction module is by the 5th to the 8th neural network block, the 4th A to the 6th down-sampling block composition, Fusion Features module is by the 9th to the 15th neural network block, the 1st to the 4th maximum Pond layer composition, decoding frame are made of the 16th to the 19th neural network block, the 1st to the 4th up-sampling layer；Output layer It is made of the first convolutional layer, first normalization layer and the first active coating, the convolution kernel size of the first convolutional layer is 3 × 3, stride Size is 1, convolution kernel number is 1, is filled with 1, and the active mode of the first active coating is " Sigmoid "；

For RGB figure input layer, input terminal receives width training left view point image, and output end exports trained left view point Image is to hidden layer；Wherein, it is desirable that training is W with the width of left view point image and height is H；

For depth map input layer, input terminal receives the received training left view point image pair of input terminal of RGB figure input layer The training depth image answered, output end export trained depth image to hidden layer；Wherein, the width of trained depth image For W and height is H；

For RGB feature extraction module, the input terminal of the 1st neural network block receives the output end output of RGB figure input layer Training left view point image, the output end of the 1st neural network block export the characteristic pattern that 64 breadth degree are W and height is H, will be defeated The set that all characteristic patterns out are constituted is denoted as P₁；The input terminal of 1st down-sampling block receives P₁In all characteristic patterns, the 1st The output end of down-sampling block exports 64 breadth degreeAnd height isCharacteristic pattern, all characteristic patterns of output are constituted Set is denoted as X₁；The input terminal of 2nd neural network block receives X₁In all characteristic patterns, the output end of the 2nd neural network block Exporting 128 breadth degree isAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P₂；2nd The input terminal of down-sampling block receives P₂In all characteristic patterns, the output end of the 2nd down-sampling block exports 128 breadth degree and isAnd Highly it isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as X₂；The input termination of 3rd neural network block Receive X₂In all characteristic patterns, the output end of the 3rd neural network block exports 256 breadth degree and isAnd height isFeature The set that all characteristic patterns of output are constituted is denoted as P by figure₃；The input terminal of 3rd down-sampling block receives P₃In all features Figure, the output end of the 3rd down-sampling block export 256 breadth degree and areAnd height isCharacteristic pattern, by all features of output The set that figure is constituted is denoted as X₃；The input terminal of 4th neural network block receives X₃In all characteristic patterns, the 4th neural network block Output end export 512 breadth degree beAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P₄；

For depth characteristic extraction module, the input terminal of the 5th neural network block receives the output end output of depth map input layer Training depth image, the output end of the 5th neural network block exports the characteristic pattern that 64 breadth degree are W and height is H, will be defeated The set that all characteristic patterns out are constituted is denoted as P₅；The input terminal of 4th down-sampling block receives P₅In all characteristic patterns, the 4th The output end of down-sampling block exports 64 breadth degreeAnd height isCharacteristic pattern, all characteristic patterns of output are constituted Set is denoted as X₄；The input terminal of 6th neural network block receives X₄In all characteristic patterns, the output end of the 6th neural network block Exporting 128 breadth degree isAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P₆；5th The input terminal of down-sampling block receives P₆In all characteristic patterns, the output end of the 5th down-sampling block exports 128 breadth degree and isAnd Highly it isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as X₅；The input termination of 7th neural network block Receive X₅In all characteristic patterns, the output end of the 7th neural network block exports 256 breadth degree and isAnd height isFeature The set that all characteristic patterns of output are constituted is denoted as P by figure₇；The input terminal of 6th down-sampling block receives P₇In all features Figure, the output end of the 6th down-sampling block export 256 breadth degree and areAnd height isCharacteristic pattern, by all features of output The set that figure is constituted is denoted as X₆；The input terminal of 8th neural network block receives X₆In all characteristic patterns, the 8th neural network block Output end export 512 breadth degree beAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P₈；

For Fusion Features module, the input terminal of the 9th neural network block receives the training of the output end output of RGB figure input layer With left view point image, the output end of the 9th neural network block exports the characteristic pattern that 3 breadth degree are W and height is H, by output The set that all characteristic patterns are constituted is denoted as P₉；The output end that the input terminal of 10th neural network block receives depth map input layer is defeated Training depth image out, the output end of the 10th neural network block export the characteristic pattern that 3 breadth degree are W and height is H, will The set that all characteristic patterns of output are constituted is denoted as P₁₀；To P₉In all characteristic patterns and P₁₀In all characteristic patterns carry out Element-wise Summation operation exports 3 breadth degree and is W and is highly after Element-wise Summation operation The set that all characteristic patterns of output are constituted is denoted as E by the characteristic pattern of H₁；The input terminal of 11st neural network block receives E₁In All characteristic patterns, the output end of the 11st neural network block exports the characteristic pattern that 64 breadth degree are W and height is H, will export All characteristic patterns constitute set be denoted as P₁₁；To P₁In all characteristic patterns, P₅In all characteristic patterns and P₁₁In it is all Characteristic pattern carries out Element-wise Summation operation, exports 64 breadth degree after Element-wise Summation operation For W and characteristic pattern that height is H, the set that all characteristic patterns of output are constituted is denoted as E₂；The input of 1st maximum pond layer End receives E₂In all characteristic patterns, the output end 64 breadth degree of output of the 1st maximum pond layer areAnd height isSpy The set that all characteristic patterns of output are constituted is denoted as Z by sign figure₁；The input terminal of 12nd neural network block receives Z₁In it is all Characteristic pattern, the output end of the 12nd neural network block export 128 breadth degree and areAnd height isCharacteristic pattern, by output The set that all characteristic patterns are constituted is denoted as P₁₂；To P₂In all characteristic patterns, P₆In all characteristic patterns and P₁₂In all spies Sign figure carries out Element-wise Summation operation, exports 128 breadth degree after Element-wise Summation operation ForAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as E₃；2nd maximum pond layer Input terminal receives E₃In all characteristic patterns, the output end 128 breadth degree of output of the 2nd maximum pond layer areAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as Z₂；The input terminal of 13rd neural network block receives Z₂ In all characteristic patterns, the output end of the 13rd neural network block exports 256 breadth degree and isAnd height isCharacteristic pattern, The set that all characteristic patterns of output are constituted is denoted as P₁₃；To P₃In all characteristic patterns, P₇In all characteristic patterns and P₁₃In All characteristic patterns carry out Element-wise Summation operation, Element-wise Summation operation after export 256 breadth degree areAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as E₄；3rd maximum The input terminal of pond layer receives E₄In all characteristic patterns, the output end 256 breadth degree of output of the 3rd maximum pond layer areAnd Highly it isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as Z₃；The input terminal of 14th neural network block Receive Z₃In all characteristic patterns, the output end of the 14th neural network block exports 512 breadth degree and isAnd height isSpy The set that all characteristic patterns of output are constituted is denoted as P by sign figure₁₄；To P₄In all characteristic patterns, P₈In all characteristic patterns and P₁₄In all characteristic patterns carry out Element-wise Summation operation, Element-wise Summation operation after it is defeated 512 breadth degree are outAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as E₅；4th most The input terminal of great Chiization layer receives E₅In all characteristic patterns, the output end 512 breadth degree of output of the 4th maximum pond layer are And height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as Z₄；The input of 15th neural network block End receives Z₄In all characteristic patterns, the output end of the 15th neural network block exports 1024 breadth degree and isAnd height is Characteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P₁₅；

For decoding frame, the input terminal of the 1st up-sampling layer receives P₁₅In all characteristic patterns, the 1st up-sampling layer it is defeated Outlet exports 1024 breadth degreeAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as S₁； The input terminal of 16th neural network block receives S₁In all characteristic patterns, the output end output 256 of the 16th neural network block Breadth degree isAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P₁₆；2nd up-sampling The input terminal of layer receives P₁₆In all characteristic patterns, the 2nd up-sampling layer output end export 256 breadth degree beAnd height ForCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as S₂；The input terminal of 17th neural network block receives S₂In all characteristic patterns, the output end of the 17th neural network block exports 128 breadth degree and isAnd height isFeature The set that all characteristic patterns of output are constituted is denoted as P by figure₁₇；The input terminal of 3rd up-sampling layer receives P₁₇In all spies Sign figure, the output end of the 3rd up-sampling layer export 128 breadth degree and areAnd height isCharacteristic pattern, by all spies of output The set that sign figure is constituted is denoted as S₃；The input terminal of 18th neural network block receives S₃In all characteristic patterns, the 18th nerve net The output end of network block exports 64 breadth degreeAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted It is denoted as P₁₈；The input terminal of 4th up-sampling layer receives P₁₈In all characteristic patterns, the 4th up-sampling layer output end output 64 The set that all characteristic patterns of output are constituted is denoted as S by the characteristic pattern that breadth degree is W and height is H₄；19th neural network The input terminal of block receives S₄In all characteristic patterns, it is W and height that the output end of the 19th neural network block, which exports 64 breadth degree, For the characteristic pattern of H, the set that all characteristic patterns of output are constituted is denoted as P₁₉；

For output layer, the input terminal of the first convolutional layer receives P₁₉In all characteristic patterns, the first convolutional layer output end output The characteristic pattern that one breadth degree is W and height is H；The input terminal of first normalization layer receives the output end output of the first convolutional layer Characteristic pattern；The input terminal of first active coating receives the characteristic pattern of the output end output of first normalization layer；First active coating Output end export a width training use the corresponding stereo-picture of left view point image Saliency maps picture；Wherein, Saliency maps as Width is W and height is H；

Step 1_3: using the left view point image of the original stereo-picture of every in training set as training left view point image, and Using the depth image of the original stereo-picture of every in training set as trained depth image, it is input to convolutional neural networks In be trained, the Saliency maps picture of every original stereo-picture in training set is obtained, by { Iⁿ(x, y) } Saliency maps As being denoted asWherein,It indicatesMiddle coordinate position is the pixel of the pixel of (x, y) Value；

Step 1_4: calculate training set in every original stereo-picture Saliency maps picture and true human eye gazing at images it Between loss function value, willWithBetween loss function value be denoted asUsing mean square error loss function It obtains；

Step 1_5: repeating step 1_3 and step 1_4 is V times total, obtains convolutional neural networks training pattern, and N is obtained × V loss function value；Then the smallest loss function value of value is found out from N × V loss function value；Then will be worth the smallest The corresponding weighted vector of loss function value and bias term are to the best initial weights vector sum that should be used as convolutional neural networks training pattern Optimal bias term, correspondence are denoted as W^bestAnd b^best；Wherein, V > 1；

The specific steps of the test phase process are as follows:

Step 2_1: it enablesIndicate the stereo-picture that width to be tested is W' and height is H', it will's Left view point image and depth image correspondence are denoted asWithWherein, 1≤x'≤W', 1≤y'≤H',It indicatesMiddle coordinate position is the pixel value of the pixel of (x', y'),It indicatesMiddle coordinate position is the pixel value of the pixel of (x', y'),It indicatesMiddle coordinate bit It is set to the pixel value of the pixel of (x', y')；

Step 2_2: willWithIt is input in convolutional neural networks training pattern, and utilizes W^bestWith b^bestIt is predicted, is obtainedConspicuousness forecast image, be denoted asWherein,It indicatesMiddle coordinate position is the pixel value of the pixel of (x', y').

2. a kind of stereo-picture vision significance detection method based on convolutional neural networks according to claim 1, In step 1_2 described in being characterized in that, the structure of the 1st to the 8th neural network block is identical, empty by set gradually first Hole convolutional layer, second batch normalization layer, the second active coating, the first residual block, the second empty convolutional layer, third batch normalization layer structure At the input terminal of the first empty convolutional layer is the input terminal of the neural network block where it, the input terminal of second batch normalization layer All characteristic patterns of the output end output of the first empty convolutional layer are received, the input terminal of the second active coating receives second batch standardization All characteristic patterns of the output end output of layer, the input terminal of the first residual block receive all of the output end output of the second active coating Characteristic pattern, the input terminal of the second empty convolutional layer receive all characteristic patterns of the output end output of the first residual block, third batch mark The input terminal of standardization layer receives all characteristic patterns of the output end output of the second empty convolutional layer, the output of third batch normalization layer The output end of neural network block of the end where it；Wherein, the 1st and the 5th neural network block respectively in the first cavity volume The convolution kernel number of lamination and the second empty convolutional layer be the 64, the 2nd and the 6th neural network block respectively in the first cavity roll up The convolution kernel number of lamination and the second empty convolutional layer be the 128, the 3rd and the 7th neural network block respectively in it is first empty The convolution kernel number of convolutional layer and the second empty convolutional layer be the 256, the 4th and the 8th neural network block respectively in the first sky The convolution kernel number of hole convolutional layer and the second empty convolutional layer be the 512, the 1st to the 8th neural network block respectively in first It is 1, cavity be 2, filling is 2 that the convolution kernel size of empty convolutional layer and the second empty convolutional layer, which is 3 × 3, stride, 1st to the 8th neural network block respectively in the active mode of the second active coating be " ReLU "；

9th identical with the 10th structure of neural network block, is standardized by the second convolutional layer set gradually and the 4th batch Layer is constituted, and the input terminal of the second convolutional layer is the input terminal of the neural network block where it, the input terminal of the 4th batch of normalization layer All characteristic patterns of the output end output of the second convolutional layer are received, the output end of the 4th batch of normalization layer is the nerve net where it The output end of network block；Wherein, the 9th and the 10th neural network block respectively in the second convolutional layer convolution kernel number be 3, Convolution kernel size be 7 × 7, stride be 1, filling is 3；

11st is identical with the 12nd structure of neural network block, by set gradually third convolutional layer, the 5th batch standardization Layer, third active coating, Volume Four lamination, the 6th batch of normalization layer are constituted, and the input terminal of third convolutional layer is the nerve where it The input terminal of network block, the input terminal of the 5th batch of normalization layer receive all characteristic patterns of the output end output of third convolutional layer, The input terminal of third active coating receives all characteristic patterns of the output end output of the 5th batch of normalization layer, the input of third active coating End receives all characteristic patterns of the output end output of the 5th batch of normalization layer, and the input terminal of Volume Four lamination receives third active coating Output end output all characteristic patterns, the input terminal of the 6th batch of normalization layer receives the institute of the output end output of Volume Four lamination There is characteristic pattern, the output end of the 6th batch of normalization layer is the output end of the neural network block where it；Wherein, the 11st nerve net The convolution kernel number of third convolutional layer and Volume Four lamination in network block is the third convolutional layer in the 64, the 12nd neural network block With the convolution kernel number of Volume Four lamination be the 128, the 11st and the 12nd neural network block respectively in third convolutional layer and the The convolution kernel size of four convolutional layers be 3 × 3, stride be 1, filling is 1；11st and the 12nd neural network block are respectively In third active coating active mode be " ReLU "；

The structure of 13rd to the 19th neural network block is identical, by set gradually the 5th convolutional layer, the 7th batch standardization Layer, the 4th active coating, the 6th convolutional layer, the 8th batch of normalization layer, the 5th active coating, the 7th convolutional layer, the 9th batch of normalization layer It constitutes, the input terminal of the 5th convolutional layer is the input terminal of the neural network block where it, the input termination of the 7th batch of normalization layer All characteristic patterns of the output end output of the 5th convolutional layer are received, the input terminal of the 4th active coating receives the defeated of the 7th batch of normalization layer All characteristic patterns of outlet output, the input terminal of the 6th convolutional layer receive all features of the output end output of the 4th active coating Figure, the input terminal of the 8th batch of normalization layer receive all characteristic patterns of the output end output of the 6th convolutional layer, the 5th active coating Input terminal receives all characteristic patterns of the output end output of the 8th batch of normalization layer, and the input terminal of the 7th convolutional layer receives the 5th and swashs All characteristic patterns of the output end output of layer living, the input terminal of the 9th batch of normalization layer receive the output end output of the 7th convolutional layer All characteristic patterns, the output end of the 9th batch of normalization layer is the output end of the neural network block where it；Wherein, the 13rd mind Convolution kernel number through the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer in network block is the 256, the 14th neural network The convolution kernel number of the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer in block is in the 512, the 15th neural network block The convolution kernel number of 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer is the 5 in the 1024, the 16th neural network block The convolution kernel number of convolutional layer, the 6th convolutional layer and the 7th convolutional layer corresponds in the 512,512,256, the 17th neural network block The 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer convolution kernel number correspond to the 256,256,128, the 18th nerve net The convolution kernel number of the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer in network block corresponds to the 128,128,64, the 19th The convolution kernel number of the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer in neural network block is the 64, the 13rd to the 19th A neural network block respectively in the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer convolution kernel size be 3 × 3, step Width is 1, filling is 1, the 13rd to the 19th neural network block respectively in the 4th active coating and the 5th active coating swash Mode living is " ReLU ".

3. a kind of stereo-picture vision significance detection method based on convolutional neural networks according to claim 2, In step 1_2 described in being characterized in that, the structure of the 1st to the 6th down-sampling block is identical, is made of the second residual block, the The input terminal of two residual blocks is the input terminal of the down-sampling block where it, and the output end of the second residual block is the down-sampling where it The output end of block.

4. a kind of stereo-picture vision significance detection method based on convolutional neural networks according to claim 3, It is characterized in that first residual block is identical with the structure of second residual block comprising 3 convolutional layers, 3 batches of marks Standardization layer and 3 active coatings, the input terminal of the 1st convolutional layer are the input terminal of the residual block where it, the 1st batch of normalization layer Input terminal receive the 1st convolutional layer output end output all characteristic patterns, the input terminal of the 1st active coating receives the 1st All characteristic patterns of the output end output of normalization layer are criticized, the input terminal of the 2nd convolutional layer receives the output end of the 1st active coating All characteristic patterns of output, the input terminal of the 2nd batch of normalization layer receive all features of the output end output of the 2nd convolutional layer Figure, the input terminal of the 2nd active coating receive all characteristic patterns of the output end output of the 2nd batch of normalization layer, the 3rd convolutional layer Input terminal receive the 2nd active coating output end output all characteristic patterns, the input terminal of the 3rd batch normalization layer receives the All characteristic patterns of the output end output of 3 convolutional layers, the received all characteristic patterns of the input terminal of the 1st convolutional layer and the 3rd All characteristic patterns for criticizing the output end output of normalization layer are added, using the output end of the 3rd active coating after the 3rd active coating All characteristic patterns of the output end output of residual block of all characteristic patterns of output as where；Wherein, the 1st and the 5th mind Through network block respectively in the first residual block in each convolutional layer convolution kernel number be the 64, the 2nd and the 6th neural network Block respectively in the first residual block in each convolutional layer convolution kernel number be the 128, the 3rd and the 7th neural network block it is each The convolution kernel number of each convolutional layer in the first residual block in be the 256, the 4th and the 8th neural network block respectively in The first residual block in each convolutional layer convolution kernel number be the 512, the 1st to the 8th neural network block respectively in the It is 1 that the convolution kernel size of the 1st convolutional layer and the 3rd convolutional layer in one residual block, which is 1 × 1, stride, and the 1st to the 8th A neural network block respectively in the first residual block in the 2nd convolutional layer convolution kernel size be 3 × 3, stride be 1, Filling is 1, the 1st and the 4th down-sampling block respectively in the second residual block in the convolution kernel number of each convolutional layer be 64, the 2nd and the 5th down-sampling block respectively in the second residual block in the convolution kernel number of each convolutional layer be the 128, the 3rd A and the 6th down-sampling block respectively in the second residual block in each convolutional layer convolution kernel number be the 256, the 1st to the 6th A down-sampling block respectively in the second residual block in the 1st convolutional layer and the 3rd convolutional layer convolution kernel size be 1 × 1, Stride is 1, the 1st to the 6th down-sampling block respectively in the second residual block in the 2nd convolutional layer convolution kernel size it is equal It is 2 for 3 × 3, stride, filling be the active modes of 1,3 active coatings is " ReLU ".

5. a kind of stereo-picture vision significance based on convolutional neural networks according to any one of claim 1 to 4 Detection method, it is characterised in that in the step 1_2, the size of the pond window of the 1st to the 4th maximum pond layer is 2 × 2, stride is 2.

6. a kind of stereo-picture vision significance detection method based on convolutional neural networks according to claim 5, In step 1_2 described in being characterized in that, the sampling configuration of the 1st to the 4th up-sampling layer is bilinear interpolation, scale factor It is 2.