CN110175986A - A kind of stereo-picture vision significance detection method based on convolutional neural networks - Google Patents

A kind of stereo-picture vision significance detection method based on convolutional neural networks Download PDF

Info

Publication number
CN110175986A
CN110175986A CN201910327556.4A CN201910327556A CN110175986A CN 110175986 A CN110175986 A CN 110175986A CN 201910327556 A CN201910327556 A CN 201910327556A CN 110175986 A CN110175986 A CN 110175986A
Authority
CN
China
Prior art keywords
layer
output
characteristic patterns
input terminal
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910327556.4A
Other languages
Chinese (zh)
Other versions
CN110175986B (en
Inventor
周武杰
吕营
雷景生
张伟
何成
王海江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lover Health Science and Technology Development Co Ltd
Zhejiang University of Science and Technology ZUST
Original Assignee
Zhejiang Lover Health Science and Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lover Health Science and Technology Development Co Ltd filed Critical Zhejiang Lover Health Science and Technology Development Co Ltd
Priority to CN201910327556.4A priority Critical patent/CN110175986B/en
Publication of CN110175986A publication Critical patent/CN110175986A/en
Application granted granted Critical
Publication of CN110175986B publication Critical patent/CN110175986B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of stereo-picture vision significance detection method based on convolutional neural networks, it constructs convolutional neural networks, include input layer, hidden layer, output layer, input layer includes RGB figure input layer and depth map input layer, hidden layer includes coding framework and decoding frame, and coding framework is made of RGB feature extraction module, depth characteristic extraction module and Fusion Features module;The left view point image of every width stereo-picture in training set and depth image are input in convolutional neural networks and are trained, the Saliency maps picture of every width stereo-picture in training set is obtained;The loss function value between the Saliency maps picture and true human eye gazing at images of every width stereo-picture in training set is calculated, obtains convolutional neural networks training pattern after repeating repeatedly;The left view point image and depth image of stereo-picture to be tested are input in convolutional neural networks training pattern, and prediction obtains conspicuousness forecast image;Advantage is its vision significance detection accuracy with higher.

Description

A kind of stereo-picture vision significance detection method based on convolutional neural networks
Technical field
The present invention relates to a kind of vision significance detection techniques, more particularly, to a kind of solid based on convolutional neural networks Image vision conspicuousness detection method.
Background technique
Vision significance is the popular research of the multiple fields such as Neuscience, robot technology, computer vision in recent years Project.Research about vision significance detection can be divided into two major classes: eyeball fixes prediction and conspicuousness target detection.The former It is several blinkpunkts for predicting people when watching natural scene, the latter is accurately to extract interested object.In general, vision Conspicuousness detection algorithm can be divided into top-down and bottom-up two class.Top-down method is task-driven, is needed Supervised learning.And bottom-up method is usually using low layer clue, as color characteristic, distance feature and heuristic conspicuousness are special Sign.Most common heuristic significant characteristics first is that contrast, such as based on pixel or block-based contrast.In the past to view Feel that the research of conspicuousness detection has focused largely on two dimensional image.However, it was found that firstly, three-dimensional data replaces 2-D data more suitable Close practical application;Secondly, extracting object outstanding merely with 2-D data is not as visual scene becomes to become increasingly complex No more.In recent years, with the three-dimensional datas acquiring technology such as Time-of-Flight sensor and Microsoft Kinect Progress, pushed the use of structure FInite Element, improved the recognition capability between the similar different objects of appearance.Depth number It is unrelated with light according to being easy to capture, geometry clue can also be provided, vision significance prediction is improved.Due to RGB data and depth The complementarity of data proposes the method that RGB image and depth image pair-wise combination are largely used for vision significance detection.It Preceding work is concentrated mainly on using the specific priori knowledge in field the significant characteristics for constructing low level, such as mankind's tendency In being more concerned about closer object, however this observation is difficult to be generalized to all scenes.In previous most of work, multimode It is all by the channel direct sequence RGB-D solve or every kind of mode of independent process that state, which merges problem, then in conjunction with two The decision of kind mode.Although these strategies achieve very big improvement, they are difficult sufficiently to explore across pattern complementary.In recent years Come, as convolutional neural networks (CNNs) are in the success of study RGB data differentiation characteristic aspect, more and more work are utilized CNNs explores more powerful effective multi-modal combined RGB-D and indicates.These work are mostly based on the architecture of two streams, wherein RGB data and depth data learn in an independent bottom-up stream, and are joined in early stage or later period binding characteristic Close reasoning.As most popular solution, double-current framework is realized than the work based on manual RGB-D characteristic significantly to be changed Into however, there are the most critical issues: how effectively to utilize the multi-modal complementary information during bottom-up.Therefore, having must RGB-D image vision conspicuousness detection technique is further studied, to improve the accuracy of vision significance detection.
Summary of the invention
Technical problem to be solved by the invention is to provide a kind of, and the stereo-picture vision based on convolutional neural networks is significant Property detection method, vision significance detection accuracy with higher.
The technical scheme of the invention to solve the technical problem is: a kind of perspective view based on convolutional neural networks As vision significance detection method, it is characterised in that including two processes of training stage and test phase;
The specific steps of the training stage process are as follows:
Step 1_1: the original stereo-picture that N breadth degree is W and height is H is chosen;Then by all original of selection Stereo-picture and the respective left view point image of all original stereo-pictures, depth image and true human eye gazing at images constitute N-th original stereo-picture in training set is denoted as { I by training setn(x, y) }, by { In(x, y) } left view point image, depth Degree image and true human eye gazing at images correspondence are denoted as{Dn(x,y)}、Wherein, N is positive whole Number, N >=300, W and H can be divided exactly by 2, and n is positive integer, and the initial value of n is 1,1≤n≤N, 1≤x≤W, 1≤y≤H, In (x, y) indicates { In(x, y) } in coordinate position be (x, y) pixel pixel value,It indicatesIn Coordinate position is the pixel value of the pixel of (x, y), Dn(x, y) indicates { Dn(x, y) } in coordinate position be (x, y) pixel Pixel value,It indicatesMiddle coordinate position is the pixel value of the pixel of (x, y);
Step 1_2: building convolutional neural networks: the convolutional neural networks include input layer, hidden layer, output layer, input layer Including RGB figure input layer and depth map input layer, hidden layer includes coding framework and decoding frame, and coding framework is mentioned by RGB feature Modulus block, depth characteristic extraction module and Fusion Features module three parts composition, RGB feature extraction module is by the 1st to the 4th Neural network block, the 1st to the 3rd down-sampling block composition, depth characteristic extraction module by the 5th to the 8th neural network block, 4th to the 6th down-sampling block composition, Fusion Features module by the 9th to the 15th neural network block, the 1st to the 4th most Great Chiization layer composition, decoding frame are made of the 16th to the 19th neural network block, the 1st to the 4th up-sampling layer;Output Layer is made of the first convolutional layer, first normalization layer and the first active coating, and the convolution kernel size of the first convolutional layer is 3 × 3, walks Width size is 1, convolution kernel number is 1, is filled with 1, and the active mode of the first active coating is " Sigmoid ";
For RGB figure input layer, input terminal receives width training left view point image, and output end output is trained left Visual point image is to hidden layer;Wherein, it is desirable that training is W with the width of left view point image and height is H;
For depth map input layer, input terminal receives the received training left view point diagram of input terminal of RGB figure input layer As corresponding trained depth image, output end exports trained depth image to hidden layer;Wherein, trained depth image Width is W and height is H;
For RGB feature extraction module, the output end that the input terminal of the 1st neural network block receives RGB figure input layer is defeated The left view point image of training out, the output end of the 1st neural network block export the characteristic pattern that 64 breadth degree are W and height is H, The set that all characteristic patterns of output are constituted is denoted as P1;The input terminal of 1st down-sampling block receives P1In all characteristic patterns, The output end of 1st down-sampling block exports 64 breadth degreeAnd height isCharacteristic pattern, by all characteristic patterns of output The set of composition is denoted as X1;The input terminal of 2nd neural network block receives X1In all characteristic patterns, the 2nd neural network block Output end exports 128 breadth degreeAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P2;The input terminal of 2nd down-sampling block receives P2In all characteristic patterns, the output end of the 2nd down-sampling block exports 128 breadth Degree isAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as X2;3rd neural network block Input terminal receive X2In all characteristic patterns, the output end of the 3rd neural network block exports 256 breadth degree and isAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P3;The input terminal of 3rd down-sampling block receives P3In All characteristic patterns, the output end of the 3rd down-sampling block export 256 breadth degree and areAnd height isCharacteristic pattern, by output The set that all characteristic patterns are constituted is denoted as X3;The input terminal of 4th neural network block receives X3In all characteristic patterns, the 4th mind Output end through network block exports 512 breadth degreeAnd height isCharacteristic pattern, all characteristic patterns of output are constituted Set is denoted as P4
For depth characteristic extraction module, the input terminal of the 5th neural network block receives the output end of depth map input layer The training depth image of output, the output end of the 5th neural network block export the characteristic pattern that 64 breadth degree are W and height is H, The set that all characteristic patterns of output are constituted is denoted as P5;The input terminal of 4th down-sampling block receives P5In all characteristic patterns, The output end of 4th down-sampling block exports 64 breadth degreeAnd height isCharacteristic pattern, by all characteristic pattern structures of output At set be denoted as X4;The input terminal of 6th neural network block receives X4In all characteristic patterns, the 6th neural network block it is defeated Outlet exports 128 breadth degreeAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P6; The input terminal of 5th down-sampling block receives P6In all characteristic patterns, the output end of the 5th down-sampling block exports 128 breadth degree ForAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as X5;7th neural network block Input terminal receives X5In all characteristic patterns, the output end of the 7th neural network block exports 256 breadth degree and isAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P7;The input terminal of 6th down-sampling block receives P7In All characteristic patterns, the output end of the 6th down-sampling block export 256 breadth degree and areAnd height isCharacteristic pattern, will export All characteristic patterns constitute set be denoted as X6;The input terminal of 8th neural network block receives X6In all characteristic patterns, the 8th The output end of neural network block exports 512 breadth degreeAnd height isCharacteristic pattern, all characteristic patterns of output are constituted Set be denoted as P8
For Fusion Features module, the input terminal of the 9th neural network block receives the output end output of RGB figure input layer Training left view point image, the output end of the 9th neural network block export the characteristic pattern that 3 breadth degree are W and height is H, will be defeated The set that all characteristic patterns out are constituted is denoted as P9;The input terminal of 10th neural network block receives the output of depth map input layer The training depth image of output is held, the output end of the 10th neural network block exports the feature that 3 breadth degree are W and height is H The set that all characteristic patterns of output are constituted is denoted as P by figure10;To P9In all characteristic patterns and P10In all characteristic patterns into Row Element-wise Summation operation, it is W and height that 3 breadth degree are exported after Element-wise Summation operation For the characteristic pattern of H, the set that all characteristic patterns of output are constituted is denoted as E1;The input terminal of 11st neural network block receives E1 In all characteristic patterns, the output end of the 11st neural network block exports the characteristic pattern that 64 breadth degree are W and height is H, will be defeated The set that all characteristic patterns out are constituted is denoted as P11;To P1In all characteristic patterns, P5In all characteristic patterns and P11In institute There is characteristic pattern to carry out Element-wise Summation operation, exports 64 breadth after Element-wise Summation operation The set that all characteristic patterns of output are constituted is denoted as E by the characteristic pattern that degree is W and height is H2;1st maximum pond layer it is defeated Enter end and receives E2In all characteristic patterns, the output end 64 breadth degree of output of the 1st maximum pond layer areAnd height is's The set that all characteristic patterns of output are constituted is denoted as Z by characteristic pattern1;The input terminal of 12nd neural network block receives Z1In institute There is characteristic pattern, the output end of the 12nd neural network block exports 128 breadth degree and isAnd height isCharacteristic pattern, will export All characteristic patterns constitute set be denoted as P12;To P2In all characteristic patterns, P6In all characteristic patterns and P12In it is all Characteristic pattern carries out Element-wise Summation operation, exports 128 breadth after Element-wise Summation operation Degree isAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as E3;2nd maximum pond layer Input terminal receive E3In all characteristic patterns, the output end 128 breadth degree of output of the 2nd maximum pond layer areAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as Z2;The input terminal of 13rd neural network block receives Z2 In all characteristic patterns, the output end of the 13rd neural network block exports 256 breadth degree and isAnd height isCharacteristic pattern, The set that all characteristic patterns of output are constituted is denoted as P13;To P3In all characteristic patterns, P7In all characteristic patterns and P13In All characteristic patterns carry out Element-wise Summation operation, Element-wise Summation operation after export 256 breadth degree areAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as E4;3rd maximum The input terminal of pond layer receives E4In all characteristic patterns, the output end 256 breadth degree of output of the 3rd maximum pond layer areAnd Highly it isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as Z3;The input terminal of 14th neural network block Receive Z3In all characteristic patterns, the output end of the 14th neural network block exports 512 breadth degree and isAnd height isSpy The set that all characteristic patterns of output are constituted is denoted as P by sign figure14;To P4In all characteristic patterns, P8In all characteristic patterns and P14In all characteristic patterns carry out Element-wise Summation operation, Element-wise Summation operation after it is defeated 512 breadth degree are outAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as E5;4th most The input terminal of great Chiization layer receives E5In all characteristic patterns, the output end 512 breadth degree of output of the 4th maximum pond layer are And height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as Z4;The input of 15th neural network block End receives Z4In all characteristic patterns, the output end of the 15th neural network block exports 1024 breadth degree and isAnd height is Characteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P15
For decoding frame, the input terminal of the 1st up-sampling layer receives P15In all characteristic patterns, the 1st up-sampling layer Output end export 1024 breadth degree beAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted remembers For S1;The input terminal of 16th neural network block receives S1In all characteristic patterns, the output end output of the 16th neural network block 256 breadth degree areAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P16;On 2nd The input terminal of sample level receives P16In all characteristic patterns, the 2nd up-sampling layer output end export 256 breadth degree beAnd Highly it isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as S2;The input terminal of 17th neural network block Receive S2In all characteristic patterns, the output end of the 17th neural network block exports 128 breadth degree and isAnd height isSpy The set that all characteristic patterns of output are constituted is denoted as P by sign figure17;The input terminal of 3rd up-sampling layer receives P17In it is all Characteristic pattern, the output end of the 3rd up-sampling layer export 128 breadth degree and areAnd height isCharacteristic pattern, by the institute of output The set for having characteristic pattern to constitute is denoted as S3;The input terminal of 18th neural network block receives S3In all characteristic patterns, the 18th mind Output end through network block exports 64 breadth degreeAnd height isCharacteristic pattern, all characteristic patterns of output are constituted Set is denoted as P18;The input terminal of 4th up-sampling layer receives P18In all characteristic patterns, the 4th up-sampling layer output end it is defeated The set that all characteristic patterns of output are constituted is denoted as S by the characteristic pattern that 64 breadth degree are W and height is H out4;19th nerve The input terminal of network block receives S4In all characteristic patterns, the output end of the 19th neural network block export 64 breadth degree be W and Height is the characteristic pattern of H, and the set that all characteristic patterns of output are constituted is denoted as P19
For output layer, the input terminal of the first convolutional layer receives P19In all characteristic patterns, the output end of the first convolutional layer Export the characteristic pattern that a breadth degree is W and height is H;The input terminal of first normalization layer receives the output end of the first convolutional layer The characteristic pattern of output;The input terminal of first active coating receives the characteristic pattern of the output end output of first normalization layer;First swashs The output end of layer living exports the Saliency maps picture that the corresponding stereo-picture of left view point image is used in width training;Wherein, Saliency maps The width of picture is W and height is H;
Step 1_3: using the left view point image of the original stereo-picture of every in training set as training left view point diagram Picture, and using the depth image of the original stereo-picture of every in training set as trained depth image, it is input to convolution mind It is trained in network, the Saliency maps picture of every original stereo-picture in training set is obtained, by { In(x, y) } it is aobvious Work property image is denoted asWherein,It indicatesMiddle coordinate position is the pixel of (x, y) Pixel value;
Step 1_4: the Saliency maps picture and true human eye for calculating every original stereo-picture in training set watch figure attentively Loss function value as between, willWithBetween loss function value be denoted asUsing mean square error loss function It obtains;
Step 1_5: repeating step 1_3 and step 1_4 is V times total, obtains convolutional neural networks training pattern, and there are To N × V loss function value;Then the smallest loss function value of value is found out from N × V loss function value;Then will be worth most The small corresponding weighted vector of loss function value and bias term swears the best initial weights that should be used as convolutional neural networks training pattern Amount and optimal bias term, correspondence are denoted as WbestAnd bbest;Wherein, V > 1;
The specific steps of the test phase process are as follows:
Step 2_1: it enablesIndicate the stereo-picture that width to be tested is W' and height is H', it willLeft view point image and depth image correspondence be denoted asWithWherein, 1≤x'≤ W', 1≤y'≤H',It indicatesMiddle coordinate position is the pixel value of the pixel of (x', y'),It indicatesMiddle coordinate position is the pixel value of the pixel of (x', y'),It indicatesMiddle coordinate position is the pixel value of the pixel of (x', y');
Step 2_2: willWithIt is input in convolutional neural networks training pattern, and utilizes WbestAnd bbestIt is predicted, is obtainedConspicuousness forecast image, be denoted asWherein,It indicatesMiddle coordinate position is the pixel value of the pixel of (x', y').
In the step 1_2, the structure of the 1st to the 8th neural network block is identical, empty by set gradually first Hole convolutional layer, second batch normalization layer, the second active coating, the first residual block, the second empty convolutional layer, third batch normalization layer structure At the input terminal of the first empty convolutional layer is the input terminal of the neural network block where it, the input terminal of second batch normalization layer All characteristic patterns of the output end output of the first empty convolutional layer are received, the input terminal of the second active coating receives second batch standardization All characteristic patterns of the output end output of layer, the input terminal of the first residual block receive all of the output end output of the second active coating Characteristic pattern, the input terminal of the second empty convolutional layer receive all characteristic patterns of the output end output of the first residual block, third batch mark The input terminal of standardization layer receives all characteristic patterns of the output end output of the second empty convolutional layer, the output of third batch normalization layer The output end of neural network block of the end where it;Wherein, the 1st and the 5th neural network block respectively in the first cavity volume The convolution kernel number of lamination and the second empty convolutional layer be the 64, the 2nd and the 6th neural network block respectively in the first cavity roll up The convolution kernel number of lamination and the second empty convolutional layer be the 128, the 3rd and the 7th neural network block respectively in it is first empty The convolution kernel number of convolutional layer and the second empty convolutional layer be the 256, the 4th and the 8th neural network block respectively in the first sky The convolution kernel number of hole convolutional layer and the second empty convolutional layer be the 512, the 1st to the 8th neural network block respectively in first It is 1, cavity be 2, filling is 2 that the convolution kernel size of empty convolutional layer and the second empty convolutional layer, which is 3 × 3, stride, 1st to the 8th neural network block respectively in the active mode of the second active coating be " ReLU ";
9th identical with the 10th structure of neural network block, is marked by the second convolutional layer set gradually and the 4th batch Standardization layer is constituted, and the input terminal of the second convolutional layer is the input terminal of the neural network block where it, the 4th batch of normalization layer it is defeated Enter all characteristic patterns that end receives the output end output of the second convolutional layer, the output end of the 4th batch of normalization layer is the mind where it Output end through network block;Wherein, the 9th and the 10th neural network block respectively in the second convolutional layer convolution kernel number it is equal For 3, convolution kernel size be 7 × 7, stride be 1, filling is 3;
11st identical with the 12nd structure of neural network block, by third convolutional layer, the 5th batch of mark set gradually Standardization layer, third active coating, Volume Four lamination, the 6th batch of normalization layer are constituted, and the input terminal of third convolutional layer is where it The input terminal of neural network block, the input terminal of the 5th batch of normalization layer receive all features of the output end output of third convolutional layer Figure, the input terminal of third active coating receive all characteristic patterns of the output end output of the 5th batch of normalization layer, third active coating Input terminal receives all characteristic patterns of the output end output of the 5th batch of normalization layer, and the input terminal of Volume Four lamination receives third and swashs All characteristic patterns of the output end output of layer living, the input terminal of the 6th batch of normalization layer receive the output end output of Volume Four lamination All characteristic patterns, the output end of the 6th batch of normalization layer is the output end of the neural network block where it;Wherein, the 11st mind Convolution kernel number through third convolutional layer and Volume Four lamination in network block is the third volume in the 64, the 12nd neural network block The convolution kernel number of lamination and Volume Four lamination is 128, the 11st and the 12nd neural network block respectively in third convolutional layer Convolution kernel size with Volume Four lamination be 3 × 3, stride be 1, filling is 1;11st and the 12nd neural network block The active mode of third active coating in respectively is " ReLU ";
The structure of 13rd to the 19th neural network block is identical, by the 5th convolutional layer, the 7th batch of mark set gradually Standardization layer, the 4th active coating, the 6th convolutional layer, the 8th batch of normalization layer, the 5th active coating, the 7th convolutional layer, the 9th batch of standard Change layer to constitute, the input terminal of the 5th convolutional layer is the input terminal of the neural network block where it, the input of the 7th batch of normalization layer End receives all characteristic patterns of the output end output of the 5th convolutional layer, and the input terminal of the 4th active coating receives the 7th batch of normalization layer Output end output all characteristic patterns, the input terminal of the 6th convolutional layer receives all spies of the output end output of the 4th active coating Sign figure, the input terminal of the 8th batch of normalization layer receive all characteristic patterns of the output end output of the 6th convolutional layer, the 5th active coating Input terminal receive the 8th batch of normalization layer output end output all characteristic patterns, the input terminal of the 7th convolutional layer receives the 5th All characteristic patterns of the output end output of active coating, the output end that the input terminal of the 9th batch of normalization layer receives the 7th convolutional layer are defeated All characteristic patterns out, the output end of the 9th batch of normalization layer are the output end of the neural network block where it;Wherein, the 13rd The convolution kernel number of the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer in neural network block is the 256, the 14th nerve net The convolution kernel number of the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer in network block is in the 512, the 15th neural network block The 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer convolution kernel number be in the 1024, the 16th neural network block the The convolution kernel number of five convolutional layers, the 6th convolutional layer and the 7th convolutional layer corresponds to the 512,512,256, the 17th neural network block In the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer convolution kernel number correspond to the 256,256,128, the 18th nerve The convolution kernel number of the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer in network block corresponds to the 128,128,64, the 19th The convolution kernel number of the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer in a neural network block is the 64, the 13rd to the 19 neural network blocks respectively in the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer convolution kernel size be 3 × 3, Stride is 1, filling is 1, the 13rd to the 19th neural network block respectively in the 4th active coating and the 5th active coating Active mode is " ReLU ".
In the step 1_2, the structure of the 1st to the 6th down-sampling block is identical, is made of the second residual block, the The input terminal of two residual blocks is the input terminal of the down-sampling block where it, and the output end of the second residual block is the down-sampling where it The output end of block.
First residual block is identical with the structure of second residual block comprising 3 convolutional layers, 3 batches of marks Standardization layer and 3 active coatings, the input terminal of the 1st convolutional layer are the input terminal of the residual block where it, the 1st batch of normalization layer Input terminal receive the 1st convolutional layer output end output all characteristic patterns, the input terminal of the 1st active coating receives the 1st All characteristic patterns of the output end output of normalization layer are criticized, the input terminal of the 2nd convolutional layer receives the output end of the 1st active coating All characteristic patterns of output, the input terminal of the 2nd batch of normalization layer receive all features of the output end output of the 2nd convolutional layer Figure, the input terminal of the 2nd active coating receive all characteristic patterns of the output end output of the 2nd batch of normalization layer, the 3rd convolutional layer Input terminal receive the 2nd active coating output end output all characteristic patterns, the input terminal of the 3rd batch normalization layer receives the All characteristic patterns of the output end output of 3 convolutional layers, the received all characteristic patterns of the input terminal of the 1st convolutional layer and the 3rd All characteristic patterns for criticizing the output end output of normalization layer are added, using the output end of the 3rd active coating after the 3rd active coating All characteristic patterns of the output end output of residual block of all characteristic patterns of output as where;Wherein, the 1st and the 5th mind Through network block respectively in the first residual block in each convolutional layer convolution kernel number be the 64, the 2nd and the 6th neural network Block respectively in the first residual block in each convolutional layer convolution kernel number be the 128, the 3rd and the 7th neural network block it is each The convolution kernel number of each convolutional layer in the first residual block in be the 256, the 4th and the 8th neural network block respectively in The first residual block in each convolutional layer convolution kernel number be the 512, the 1st to the 8th neural network block respectively in the It is 1 that the convolution kernel size of the 1st convolutional layer and the 3rd convolutional layer in one residual block, which is 1 × 1, stride, and the 1st to the 8th A neural network block respectively in the first residual block in the 2nd convolutional layer convolution kernel size be 3 × 3, stride be 1, Filling is 1, the 1st and the 4th down-sampling block respectively in the second residual block in the convolution kernel number of each convolutional layer be 64, the 2nd and the 5th down-sampling block respectively in the second residual block in the convolution kernel number of each convolutional layer be the 128, the 3rd A and the 6th down-sampling block respectively in the second residual block in each convolutional layer convolution kernel number be the 256, the 1st to the 6th A down-sampling block respectively in the second residual block in the 1st convolutional layer and the 3rd convolutional layer convolution kernel size be 1 × 1, Stride is 1, the 1st to the 6th down-sampling block respectively in the second residual block in the 2nd convolutional layer convolution kernel size it is equal It is 2 for 3 × 3, stride, filling be the active modes of 1,3 active coatings is " ReLU ".
In the step 1_2, the size of the pond window of the 1st to the 4th maximum pond layer is that 2 × 2, stride is equal It is 2.
In the step 1_2, the sampling configuration of the 1st to the 4th up-sampling layer is bilinear interpolation, scale factor It is 2.
Compared with the prior art, the advantages of the present invention are as follows:
1) coding framework proposed in convolutional neural networks of the method for the present invention by building is to RGB image and depth image Module (i.e. RGB feature extraction module and depth characteristic extraction module) is respectively trained to learn the RGB and depth of different stage Feature is spent, and proposes the module of special fusion a RGB and depth characteristic, i.e. Fusion Features module, from rudimentary to advanced Both features are merged, is conducive to make full use of and forms new differentiation feature across modal information, improve stereoscopic vision conspicuousness The accuracy of prediction.
2) in the RGB feature extraction module and depth characteristic extraction module in the convolutional neural networks of the method for the present invention building Down-sampling block using stride by 2 residual block come replace in the past work used in maximum pond layer, it is adaptive to be conducive to model Ground selected characteristic information is answered, avoids and loses important information since maximum pondization operates.
3) in the RGB feature extraction module and depth characteristic extraction module in the convolutional neural networks of the method for the present invention building The residual block that front and back belt has empty convolutional layer is introduced, the acceptance region of convolution kernel is expanded, is conducive to the convolutional Neural net of building Network is more concerned about global information, acquires more abundant content.
Detailed description of the invention
Fig. 1 is the composition schematic diagram of the convolutional neural networks of the method for the present invention building.
Specific embodiment
The present invention will be described in further detail below with reference to the embodiments of the drawings.
A kind of stereo-picture vision significance detection method based on convolutional neural networks proposed by the present invention comprising instruction Practice two processes of stage and test phase.
The specific steps of the training stage process are as follows:
Step 1_1: the original stereo-picture that N breadth degree is W and height is H is chosen;Then by all original of selection Stereo-picture and the respective left view point image of all original stereo-pictures, depth image and true human eye gazing at images constitute N-th original stereo-picture in training set is denoted as { I by training setn(x, y) }, by { In(x, y) } left view point image, depth Degree image and true human eye gazing at images correspondence are denoted as{Dn(x,y)}、Wherein, N is positive whole Number, N >=300 such as take N=600, W and H that can be divided exactly by 2, and n is positive integer, and the initial value of n is 1, and 1≤n≤N, 1≤x≤ W, 1≤y≤H, In(x, y) indicates { In(x, y) } in coordinate position be (x, y) pixel pixel value,It indicatesMiddle coordinate position is the pixel value of the pixel of (x, y), Dn(x, y) indicates { Dn(x, y) } in coordinate position be The pixel value of the pixel of (x, y),It indicatesMiddle coordinate position is the pixel of the pixel of (x, y) Value.
Step 1_2: building convolutional neural networks: as shown in Figure 1, the convolutional neural networks include input layer, hidden layer, output Layer, input layer include RGB figure input layer and depth map input layer, and hidden layer includes coding framework and decoding frame, coding framework by RGB feature extraction module, depth characteristic extraction module and Fusion Features module three parts composition, RGB feature extraction module is by the 1st A to the 4th neural network block, the 1st to the 3rd down-sampling block composition, depth characteristic extraction module is by the 5th to the 8th mind Through network block, the 4th to the 6th down-sampling block composition, Fusion Features module is by the 9th to the 15th neural network block, the 1st To the 4th maximum pond layer composition, frame is decoded by the 16th to the 19th neural network block, the 1st to the 4th up-sampling layer Composition;Output layer is made of the first convolutional layer, first normalization layer and the first active coating, the convolution kernel size of the first convolutional layer It is 1, is filled with 1 for 3 × 3, step size 1, convolution kernel number, the active mode of the first active coating is " Sigmoid ".
For RGB figure input layer, input terminal receives width training left view point image, and output end output is trained left Visual point image is to hidden layer;Wherein, it is desirable that training is W with the width of left view point image and height is H.
For depth map input layer, input terminal receives the received training left view point diagram of input terminal of RGB figure input layer As corresponding trained depth image, output end exports trained depth image to hidden layer;Wherein, trained depth image Width is W and height is H.
For RGB feature extraction module, the output end that the input terminal of the 1st neural network block receives RGB figure input layer is defeated The left view point image of training out, the output end of the 1st neural network block export the characteristic pattern that 64 breadth degree are W and height is H, The set that all characteristic patterns of output are constituted is denoted as P1;The input terminal of 1st down-sampling block receives P1In all characteristic patterns, The output end of 1st down-sampling block exports 64 breadth degreeAnd height isCharacteristic pattern, by all characteristic pattern structures of output At set be denoted as X1;The input terminal of 2nd neural network block receives X1In all characteristic patterns, the 2nd neural network block it is defeated Outlet exports 128 breadth degreeAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P2; The input terminal of 2nd down-sampling block receives P2In all characteristic patterns, the output end of the 2nd down-sampling block exports 128 breadth degree ForAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as X2;3rd neural network block Input terminal receives X2In all characteristic patterns, the output end of the 3rd neural network block exports 256 breadth degree and isAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P3;The input terminal of 3rd down-sampling block receives P3In All characteristic patterns, the output end of the 3rd down-sampling block export 256 breadth degree and areAnd height isCharacteristic pattern, by output The set that all characteristic patterns are constituted is denoted as X3;The input terminal of 4th neural network block receives X3In all characteristic patterns, the 4th mind Output end through network block exports 512 breadth degreeAnd height isCharacteristic pattern, all characteristic patterns of output are constituted Set is denoted as P4
For depth characteristic extraction module, the input terminal of the 5th neural network block receives the output end of depth map input layer The training depth image of output, the output end of the 5th neural network block export the characteristic pattern that 64 breadth degree are W and height is H, The set that all characteristic patterns of output are constituted is denoted as P5;The input terminal of 4th down-sampling block receives P5In all characteristic patterns, The output end of 4th down-sampling block exports 64 breadth degreeAnd height isCharacteristic pattern, by all characteristic pattern structures of output At set be denoted as X4;The input terminal of 6th neural network block receives X4In all characteristic patterns, the 6th neural network block it is defeated Outlet exports 128 breadth degreeAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P6; The input terminal of 5th down-sampling block receives P6In all characteristic patterns, the output end of the 5th down-sampling block exports 128 breadth degree ForAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as X5;7th neural network block Input terminal receives X5In all characteristic patterns, the output end of the 7th neural network block exports 256 breadth degree and isAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P7;The input terminal of 6th down-sampling block receives P7In All characteristic patterns, the output end of the 6th down-sampling block export 256 breadth degree and areAnd height isCharacteristic pattern, by output The set that all characteristic patterns are constituted is denoted as X6;The input terminal of 8th neural network block receives X6In all characteristic patterns, the 8th mind Output end through network block exports 512 breadth degreeAnd height isCharacteristic pattern, all characteristic patterns of output are constituted Set is denoted as P8
For Fusion Features module, the input terminal of the 9th neural network block receives the output end output of RGB figure input layer Training left view point image, the output end of the 9th neural network block export the characteristic pattern that 3 breadth degree are W and height is H, will be defeated The set that all characteristic patterns out are constituted is denoted as P9;The input terminal of 10th neural network block receives the output of depth map input layer The training depth image of output is held, the output end of the 10th neural network block exports the feature that 3 breadth degree are W and height is H The set that all characteristic patterns of output are constituted is denoted as P by figure10;To P9In all characteristic patterns and P10In all characteristic patterns into Row Element-wise Summation operation, it is W and height that 3 breadth degree are exported after Element-wise Summation operation For the characteristic pattern of H, the set that all characteristic patterns of output are constituted is denoted as E1;The input terminal of 11st neural network block receives E1 In all characteristic patterns, the output end of the 11st neural network block exports the characteristic pattern that 64 breadth degree are W and height is H, will be defeated The set that all characteristic patterns out are constituted is denoted as P11;To P1In all characteristic patterns, P5In all characteristic patterns and P11In institute There is characteristic pattern to carry out Element-wise Summation operation, exports 64 breadth after Element-wise Summation operation The set that all characteristic patterns of output are constituted is denoted as E by the characteristic pattern that degree is W and height is H2;1st maximum pond layer it is defeated Enter end and receives E2In all characteristic patterns, the output end 64 breadth degree of output of the 1st maximum pond layer areAnd height is's The set that all characteristic patterns of output are constituted is denoted as Z by characteristic pattern1;The input terminal of 12nd neural network block receives Z1In institute There is characteristic pattern, the output end of the 12nd neural network block exports 128 breadth degree and isAnd height isCharacteristic pattern, will export All characteristic patterns constitute set be denoted as P12;To P2In all characteristic patterns, P6In all characteristic patterns and P12In it is all Characteristic pattern carries out Element-wise Summation operation, exports 128 breadth after Element-wise Summation operation Degree isAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as E3;2nd maximum pond layer Input terminal receive E3In all characteristic patterns, the output end 128 breadth degree of output of the 2nd maximum pond layer areAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as Z2;The input terminal of 13rd neural network block receives Z2 In all characteristic patterns, the output end of the 13rd neural network block exports 256 breadth degree and isAnd height isCharacteristic pattern, The set that all characteristic patterns of output are constituted is denoted as P13;To P3In all characteristic patterns, P7In all characteristic patterns and P13In All characteristic patterns carry out Element-wise Summation operation, Element-wise Summation operation after export 256 breadth degree areAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as E4;3rd maximum The input terminal of pond layer receives E4In all characteristic patterns, the output end 256 breadth degree of output of the 3rd maximum pond layer areAnd Highly it isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as Z3;The input terminal of 14th neural network block Receive Z3In all characteristic patterns, the output end of the 14th neural network block exports 512 breadth degree and isAnd height isSpy The set that all characteristic patterns of output are constituted is denoted as P by sign figure14;To P4In all characteristic patterns, P8In all characteristic patterns and P14In all characteristic patterns carry out Element-wise Summation operation, Element-wise Summation operation after it is defeated 512 breadth degree are outAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as E5;4th most The input terminal of great Chiization layer receives E5In all characteristic patterns, the output end 512 breadth degree of output of the 4th maximum pond layer areAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as Z4;15th neural network block it is defeated Enter end and receives Z4In all characteristic patterns, the output end of the 15th neural network block exports 1024 breadth degree and isAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P15
For decoding frame, the input terminal of the 1st up-sampling layer receives P15In all characteristic patterns, the 1st up-sampling layer Output end export 1024 breadth degree beAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted remembers For S1;The input terminal of 16th neural network block receives S1In all characteristic patterns, the output end output of the 16th neural network block 256 breadth degree areAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P16;On 2nd The input terminal of sample level receives P16In all characteristic patterns, the 2nd up-sampling layer output end export 256 breadth degree beAnd Highly it isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as S2;The input terminal of 17th neural network block Receive S2In all characteristic patterns, the output end of the 17th neural network block exports 128 breadth degree and isAnd height isSpy The set that all characteristic patterns of output are constituted is denoted as P by sign figure17;The input terminal of 3rd up-sampling layer receives P17In it is all Characteristic pattern, the output end of the 3rd up-sampling layer export 128 breadth degree and areAnd height isCharacteristic pattern, by the institute of output The set for having characteristic pattern to constitute is denoted as S3;The input terminal of 18th neural network block receives S3In all characteristic patterns, the 18th mind Output end through network block exports 64 breadth degreeAnd height isCharacteristic pattern, all characteristic patterns of output are constituted Set is denoted as P18;The input terminal of 4th up-sampling layer receives P18In all characteristic patterns, the 4th up-sampling layer output end it is defeated The set that all characteristic patterns of output are constituted is denoted as S by the characteristic pattern that 64 breadth degree are W and height is H out4;19th nerve The input terminal of network block receives S4In all characteristic patterns, the output end of the 19th neural network block export 64 breadth degree be W and Height is the characteristic pattern of H, and the set that all characteristic patterns of output are constituted is denoted as P19
For output layer, the input terminal of the first convolutional layer receives P19In all characteristic patterns, the output end of the first convolutional layer Export the characteristic pattern that a breadth degree is W and height is H;The input terminal of first normalization layer receives the output end of the first convolutional layer The characteristic pattern of output;The input terminal of first active coating receives the characteristic pattern of the output end output of first normalization layer;First swashs The output end of layer living exports the Saliency maps picture that the corresponding stereo-picture of left view point image is used in width training;Wherein, Saliency maps The width of picture is W and height is H.
Step 1_3: using the left view point image of the original stereo-picture of every in training set as training left view point diagram Picture, and using the depth image of the original stereo-picture of every in training set as trained depth image, it is input to convolution mind It is trained in network, the Saliency maps picture of every original stereo-picture in training set is obtained, by { In(x, y) } it is aobvious Work property image is denoted asWherein,It indicatesMiddle coordinate position is the pixel of (x, y) Pixel value.
Step 1_4: the Saliency maps picture and true human eye for calculating every original stereo-picture in training set watch figure attentively Loss function value as between, willWithBetween loss function value be denoted asUsing mean square error loss function It obtains.
Step 1_5: repeating step 1_3 and step 1_4 is V times total, obtains convolutional neural networks training pattern, and there are To N × V loss function value;Then the smallest loss function value of value is found out from N × V loss function value;Then will be worth most The small corresponding weighted vector of loss function value and bias term swears the best initial weights that should be used as convolutional neural networks training pattern Amount and optimal bias term, correspondence are denoted as WbestAnd bbest;Wherein, V > 1, such as takes V=50.
The specific steps of the test phase process are as follows:
Step 2_1: it enablesIndicate the stereo-picture that width to be tested is W' and height is H', it willLeft view point image and depth image correspondence be denoted asWithWherein, 1≤x'≤ W', 1≤y'≤H',It indicatesMiddle coordinate position is the pixel value of the pixel of (x', y'),It indicatesMiddle coordinate position is the pixel value of the pixel of (x', y'),It indicatesMiddle coordinate position is the pixel value of the pixel of (x', y').
Step 2_2: willWithIt is input in convolutional neural networks training pattern, and utilizes WbestAnd bbestIt is predicted, is obtainedConspicuousness forecast image, be denoted asWherein,It indicatesMiddle coordinate position is the pixel value of the pixel of (x', y').
In this particular embodiment, in step 1_2, the structure of the 1st to the 8th neural network block is identical, by successively The empty convolutional layer of first be arranged, second batch normalization layer, the second active coating, the first residual block, the second empty convolutional layer, third It criticizes normalization layer to constitute, the input terminal of the first empty convolutional layer is the input terminal of the neural network block where it, second batch standard The input terminal for changing layer receives all characteristic patterns that the output end of the first empty convolutional layer exports, and the input terminal of the second active coating receives All characteristic patterns of the output end output of second batch normalization layer, the input terminal of the first residual block receive the output of the second active coating All characteristic patterns of output are held, the input terminal of the second empty convolutional layer receives all features of the output end output of the first residual block Figure, the input terminal of third batch normalization layer receive all characteristic patterns of the output end output of the second empty convolutional layer, third batch mark The output end of standardization layer is the output end of the neural network block where it;Wherein, the 1st and the 5th neural network block respectively in The first empty convolutional layer and the second empty convolutional layer convolution kernel number be the 64, the 2nd and the 6th neural network block respectively in The first empty convolutional layer and the second empty convolutional layer convolution kernel number be the 128, the 3rd and the 7th neural network block respectively In the first empty convolutional layer and the convolution kernel number of the second empty convolutional layer be that the 256, the 4th and the 8th neural network block are each The convolution kernel number of the first empty convolutional layer and the second empty convolutional layer in is the 512, the 1st to the 8th neural network block It is that 1, cavity is that the convolution kernel size of the first empty convolutional layer and the second empty convolutional layer in respectively, which is 3 × 3, stride, 2, filling is 2, the 1st to the 8th neural network block respectively in the active mode of the second active coating be " ReLU ".
9th identical with the 10th structure of neural network block, is marked by the second convolutional layer set gradually and the 4th batch Standardization layer is constituted, and the input terminal of the second convolutional layer is the input terminal of the neural network block where it, the 4th batch of normalization layer it is defeated Enter all characteristic patterns that end receives the output end output of the second convolutional layer, the output end of the 4th batch of normalization layer is the mind where it Output end through network block;Wherein, the 9th and the 10th neural network block respectively in the second convolutional layer convolution kernel number it is equal For 3, convolution kernel size be 7 × 7, stride be 1, filling is 3.
11st identical with the 12nd structure of neural network block, by third convolutional layer, the 5th batch of mark set gradually Standardization layer, third active coating, Volume Four lamination, the 6th batch of normalization layer are constituted, and the input terminal of third convolutional layer is where it The input terminal of neural network block, the input terminal of the 5th batch of normalization layer receive all features of the output end output of third convolutional layer Figure, the input terminal of third active coating receive all characteristic patterns of the output end output of the 5th batch of normalization layer, third active coating Input terminal receives all characteristic patterns of the output end output of the 5th batch of normalization layer, and the input terminal of Volume Four lamination receives third and swashs All characteristic patterns of the output end output of layer living, the input terminal of the 6th batch of normalization layer receive the output end output of Volume Four lamination All characteristic patterns, the output end of the 6th batch of normalization layer is the output end of the neural network block where it;Wherein, the 11st mind Convolution kernel number through third convolutional layer and Volume Four lamination in network block is the third volume in the 64, the 12nd neural network block The convolution kernel number of lamination and Volume Four lamination is 128, the 11st and the 12nd neural network block respectively in third convolutional layer Convolution kernel size with Volume Four lamination be 3 × 3, stride be 1, filling is 1;11st and the 12nd neural network block The active mode of third active coating in respectively is " ReLU ".
The structure of 13rd to the 19th neural network block is identical, by the 5th convolutional layer, the 7th batch of mark set gradually Standardization layer, the 4th active coating, the 6th convolutional layer, the 8th batch of normalization layer, the 5th active coating, the 7th convolutional layer, the 9th batch of standard Change layer to constitute, the input terminal of the 5th convolutional layer is the input terminal of the neural network block where it, the input of the 7th batch of normalization layer End receives all characteristic patterns of the output end output of the 5th convolutional layer, and the input terminal of the 4th active coating receives the 7th batch of normalization layer Output end output all characteristic patterns, the input terminal of the 6th convolutional layer receives all spies of the output end output of the 4th active coating Sign figure, the input terminal of the 8th batch of normalization layer receive all characteristic patterns of the output end output of the 6th convolutional layer, the 5th active coating Input terminal receive the 8th batch of normalization layer output end output all characteristic patterns, the input terminal of the 7th convolutional layer receives the 5th All characteristic patterns of the output end output of active coating, the output end that the input terminal of the 9th batch of normalization layer receives the 7th convolutional layer are defeated All characteristic patterns out, the output end of the 9th batch of normalization layer are the output end of the neural network block where it;Wherein, the 13rd The convolution kernel number of the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer in neural network block is the 256, the 14th nerve net The convolution kernel number of the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer in network block is in the 512, the 15th neural network block The 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer convolution kernel number be in the 1024, the 16th neural network block the The convolution kernel number of five convolutional layers, the 6th convolutional layer and the 7th convolutional layer corresponds to the 512,512,256, the 17th neural network block In the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer convolution kernel number correspond to the 256,256,128, the 18th nerve The convolution kernel number of the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer in network block corresponds to the 128,128,64, the 19th The convolution kernel number of the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer in a neural network block is the 64, the 13rd to the 19 neural network blocks respectively in the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer convolution kernel size be 3 × 3, Stride is 1, filling is 1, the 13rd to the 19th neural network block respectively in the 4th active coating and the 5th active coating Active mode is " ReLU ".
In this particular embodiment, in step 1_2, the structure of the 1st to the 6th down-sampling block is identical, residual by second Poor block is constituted, and the input terminal of the second residual block is the input terminal of the down-sampling block where it, and the output end of the second residual block is it The output end of the down-sampling block at place.
In this particular embodiment, the first residual block is identical with the structure of second residual block comprising 3 convolution Layer, 3 batches of normalization layers and 3 active coatings, the input terminal of residual block of the input terminal of the 1st convolutional layer where it, the 1st The input terminal for criticizing normalization layer receives all characteristic patterns that the output end of the 1st convolutional layer exports, the input terminal of the 1st active coating All characteristic patterns of the output end output of the 1st batch of normalization layer are received, the input terminal of the 2nd convolutional layer receives the 1st activation All characteristic patterns of the output end output of layer, the input terminal of the 2nd batch of normalization layer receive the output end output of the 2nd convolutional layer All characteristic patterns, the input terminal of the 2nd active coating receives all characteristic patterns of the output end output of the 2nd batch normalization layer, The input terminal of 3rd convolutional layer receives all characteristic patterns of the output end output of the 2nd active coating, and the 3rd is criticized normalization layer Input terminal receives all characteristic patterns of the output end output of the 3rd convolutional layer, the received all spies of the input terminal of the 1st convolutional layer Sign figure is added with all characteristic patterns of the output end of the 3rd batch of normalization layer output, is swashed using the 3rd after the 3rd active coating All characteristic patterns of the output end output of residual block of all characteristic patterns of the output end output of layer living as where;Wherein, the 1st A and the 5th neural network block respectively in the first residual block in the convolution kernel number of each convolutional layer be the 64, the 2nd and the 6 neural network blocks respectively in the first residual block in each convolutional layer convolution kernel number be the 128, the 3rd and the 7th mind Through network block respectively in the first residual block in each convolutional layer convolution kernel number be the 256, the 4th and the 8th nerve net Network block respectively in the first residual block in each convolutional layer convolution kernel number be the 512, the 1st to the 8th neural network block It is 1 that the convolution kernel size of the 1st convolutional layer and the 3rd convolutional layer in the first residual block in respectively, which is 1 × 1, stride, 1st to the 8th neural network block respectively in the first residual block in the 2nd convolutional layer convolution kernel size be 3 × 3, Stride is 1, filling is 1, the 1st and the 4th down-sampling block respectively in the second residual block in each convolutional layer volume Product core number be the 64, the 2nd and the 5th down-sampling block respectively in the second residual block in each convolutional layer convolution kernel number Be the 128, the 3rd and the 6th down-sampling block respectively in the second residual block in each convolutional layer convolution kernel number be 256, 1st to the 6th down-sampling block respectively in the second residual block in the 1st convolutional layer and the 3rd convolutional layer convolution kernel it is big Small be 1 × 1, stride be 1, the 1st to the 6th down-sampling block respectively in the second residual block in the 2nd convolutional layer Convolution kernel size be 3 × 3, stride be 2, filling be the active modes of 1,3 active coatings is " ReLU ".
In this particular embodiment, in step 1_2, the size of the pond window of the 1st to the 4th maximum pond layer is 2 × 2, stride is 2.
In in this particular embodiment, in step 1_2, the sampling configuration of the 1st to the 4th up-sampling layer is bilinearity Interpolation, scale factor are 2.
In order to verify the feasibility and validity of the method for the present invention, tested.
Here, using TaiWan, China university of communications provide three-dimensional tracing of human eye database (NCTU-3DFixation) come Analyze the Stability and veracity of the method for the present invention.Here, objective parameter is commonly used using 4 of the assessment significant extracting method of vision As evaluation index, i.e. linearly dependent coefficient (Linear Correlation Coefficient, CC), Kullback- Leibler divergence coefficient (Kullback-Leibler Divergence, KLD), AUC parameter (the Area Under the Receiver operating characteristics Curve, AUC), normalized scans path conspicuousness (Normalized Scanpath Saliency, NSS).
The every width obtained in the three-dimensional tracing of human eye database that TaiWan, China university of communications provides using the method for the present invention is vertical The conspicuousness forecast image of body image, and the subjective vision notable figure with every width stereo-picture in three-dimensional tracing of human eye database I.e. true human eye gazing at images (existing in three-dimensional tracing of human eye database) is compared, and CC, AUC and NSS value is higher, KLD value The lower conspicuousness forecast image for illustrating the method for the present invention acquisition and the consistency of subjective vision notable figure are better.The reflection present invention CC, KLD, AUC and NSS index of correlation of the significant extraction performance of method are as listed in table 1.
The Stability and veracity of conspicuousness forecast image and subjective vision notable figure that table 1 is obtained using the method for the present invention
Performance indicator CC KLD AUC(Borji) NSS
Performance index value 0.7583 0.4868 0.8789 2.0692
The data listed by the table 1 are it is found that the conspicuousness forecast image and subjective vision notable figure obtained by the method for the present invention Stability and veracity be well, show that the result of objective testing result and human eye subjective perception is more consistent, it is sufficient to say The feasibility and validity of bright the method for the present invention.

Claims (6)

1. a kind of stereo-picture vision significance detection method based on convolutional neural networks, it is characterised in that including the training stage With two processes of test phase;
The specific steps of the training stage process are as follows:
Step 1_1: the original stereo-picture that N breadth degree is W and height is H is chosen;Then all original of selection is stood Body image and the respective left view point image of all original stereo-pictures, depth image and true human eye gazing at images composing training Collection, is denoted as { I for n-th original stereo-picture in training setn(x, y) }, by { In(x, y) } left view point image, depth map Picture and true human eye gazing at images correspondence are denoted as{Dn(x,y)}、Wherein, N is positive integer, N >=300, W and H can be divided exactly by 2, and n is positive integer, and the initial value of n is 1,1≤n≤N, 1≤x≤W, 1≤y≤H, In(x, Y) { I is indicatedn(x, y) } in coordinate position be (x, y) pixel pixel value,It indicatesMiddle seat Mark is set to the pixel value of the pixel of (x, y), Dn(x, y) indicates { Dn(x, y) } in coordinate position be (x, y) pixel Pixel value,It indicatesMiddle coordinate position is the pixel value of the pixel of (x, y);
Step 1_2: building convolutional neural networks: the convolutional neural networks include input layer, hidden layer, output layer, and input layer includes RGB figure input layer and depth map input layer, hidden layer include coding framework and decoding frame, and coding framework extracts mould by RGB feature Block, depth characteristic extraction module and Fusion Features module three parts composition, RGB feature extraction module is by the 1st to the 4th nerve Network block, the 1st to the 3rd down-sampling block composition, depth characteristic extraction module is by the 5th to the 8th neural network block, the 4th A to the 6th down-sampling block composition, Fusion Features module is by the 9th to the 15th neural network block, the 1st to the 4th maximum Pond layer composition, decoding frame are made of the 16th to the 19th neural network block, the 1st to the 4th up-sampling layer;Output layer It is made of the first convolutional layer, first normalization layer and the first active coating, the convolution kernel size of the first convolutional layer is 3 × 3, stride Size is 1, convolution kernel number is 1, is filled with 1, and the active mode of the first active coating is " Sigmoid ";
For RGB figure input layer, input terminal receives width training left view point image, and output end exports trained left view point Image is to hidden layer;Wherein, it is desirable that training is W with the width of left view point image and height is H;
For depth map input layer, input terminal receives the received training left view point image pair of input terminal of RGB figure input layer The training depth image answered, output end export trained depth image to hidden layer;Wherein, the width of trained depth image For W and height is H;
For RGB feature extraction module, the input terminal of the 1st neural network block receives the output end output of RGB figure input layer Training left view point image, the output end of the 1st neural network block export the characteristic pattern that 64 breadth degree are W and height is H, will be defeated The set that all characteristic patterns out are constituted is denoted as P1;The input terminal of 1st down-sampling block receives P1In all characteristic patterns, the 1st The output end of down-sampling block exports 64 breadth degreeAnd height isCharacteristic pattern, all characteristic patterns of output are constituted Set is denoted as X1;The input terminal of 2nd neural network block receives X1In all characteristic patterns, the output end of the 2nd neural network block Exporting 128 breadth degree isAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P2;2nd The input terminal of down-sampling block receives P2In all characteristic patterns, the output end of the 2nd down-sampling block exports 128 breadth degree and isAnd Highly it isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as X2;The input termination of 3rd neural network block Receive X2In all characteristic patterns, the output end of the 3rd neural network block exports 256 breadth degree and isAnd height isFeature The set that all characteristic patterns of output are constituted is denoted as P by figure3;The input terminal of 3rd down-sampling block receives P3In all features Figure, the output end of the 3rd down-sampling block export 256 breadth degree and areAnd height isCharacteristic pattern, by all features of output The set that figure is constituted is denoted as X3;The input terminal of 4th neural network block receives X3In all characteristic patterns, the 4th neural network block Output end export 512 breadth degree beAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P4
For depth characteristic extraction module, the input terminal of the 5th neural network block receives the output end output of depth map input layer Training depth image, the output end of the 5th neural network block exports the characteristic pattern that 64 breadth degree are W and height is H, will be defeated The set that all characteristic patterns out are constituted is denoted as P5;The input terminal of 4th down-sampling block receives P5In all characteristic patterns, the 4th The output end of down-sampling block exports 64 breadth degreeAnd height isCharacteristic pattern, all characteristic patterns of output are constituted Set is denoted as X4;The input terminal of 6th neural network block receives X4In all characteristic patterns, the output end of the 6th neural network block Exporting 128 breadth degree isAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P6;5th The input terminal of down-sampling block receives P6In all characteristic patterns, the output end of the 5th down-sampling block exports 128 breadth degree and isAnd Highly it isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as X5;The input termination of 7th neural network block Receive X5In all characteristic patterns, the output end of the 7th neural network block exports 256 breadth degree and isAnd height isFeature The set that all characteristic patterns of output are constituted is denoted as P by figure7;The input terminal of 6th down-sampling block receives P7In all features Figure, the output end of the 6th down-sampling block export 256 breadth degree and areAnd height isCharacteristic pattern, by all features of output The set that figure is constituted is denoted as X6;The input terminal of 8th neural network block receives X6In all characteristic patterns, the 8th neural network block Output end export 512 breadth degree beAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P8
For Fusion Features module, the input terminal of the 9th neural network block receives the training of the output end output of RGB figure input layer With left view point image, the output end of the 9th neural network block exports the characteristic pattern that 3 breadth degree are W and height is H, by output The set that all characteristic patterns are constituted is denoted as P9;The output end that the input terminal of 10th neural network block receives depth map input layer is defeated Training depth image out, the output end of the 10th neural network block export the characteristic pattern that 3 breadth degree are W and height is H, will The set that all characteristic patterns of output are constituted is denoted as P10;To P9In all characteristic patterns and P10In all characteristic patterns carry out Element-wise Summation operation exports 3 breadth degree and is W and is highly after Element-wise Summation operation The set that all characteristic patterns of output are constituted is denoted as E by the characteristic pattern of H1;The input terminal of 11st neural network block receives E1In All characteristic patterns, the output end of the 11st neural network block exports the characteristic pattern that 64 breadth degree are W and height is H, will export All characteristic patterns constitute set be denoted as P11;To P1In all characteristic patterns, P5In all characteristic patterns and P11In it is all Characteristic pattern carries out Element-wise Summation operation, exports 64 breadth degree after Element-wise Summation operation For W and characteristic pattern that height is H, the set that all characteristic patterns of output are constituted is denoted as E2;The input of 1st maximum pond layer End receives E2In all characteristic patterns, the output end 64 breadth degree of output of the 1st maximum pond layer areAnd height isSpy The set that all characteristic patterns of output are constituted is denoted as Z by sign figure1;The input terminal of 12nd neural network block receives Z1In it is all Characteristic pattern, the output end of the 12nd neural network block export 128 breadth degree and areAnd height isCharacteristic pattern, by output The set that all characteristic patterns are constituted is denoted as P12;To P2In all characteristic patterns, P6In all characteristic patterns and P12In all spies Sign figure carries out Element-wise Summation operation, exports 128 breadth degree after Element-wise Summation operation ForAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as E3;2nd maximum pond layer Input terminal receives E3In all characteristic patterns, the output end 128 breadth degree of output of the 2nd maximum pond layer areAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as Z2;The input terminal of 13rd neural network block receives Z2 In all characteristic patterns, the output end of the 13rd neural network block exports 256 breadth degree and isAnd height isCharacteristic pattern, The set that all characteristic patterns of output are constituted is denoted as P13;To P3In all characteristic patterns, P7In all characteristic patterns and P13In All characteristic patterns carry out Element-wise Summation operation, Element-wise Summation operation after export 256 breadth degree areAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as E4;3rd maximum The input terminal of pond layer receives E4In all characteristic patterns, the output end 256 breadth degree of output of the 3rd maximum pond layer areAnd Highly it isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as Z3;The input terminal of 14th neural network block Receive Z3In all characteristic patterns, the output end of the 14th neural network block exports 512 breadth degree and isAnd height isSpy The set that all characteristic patterns of output are constituted is denoted as P by sign figure14;To P4In all characteristic patterns, P8In all characteristic patterns and P14In all characteristic patterns carry out Element-wise Summation operation, Element-wise Summation operation after it is defeated 512 breadth degree are outAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as E5;4th most The input terminal of great Chiization layer receives E5In all characteristic patterns, the output end 512 breadth degree of output of the 4th maximum pond layer are And height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as Z4;The input of 15th neural network block End receives Z4In all characteristic patterns, the output end of the 15th neural network block exports 1024 breadth degree and isAnd height is Characteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P15
For decoding frame, the input terminal of the 1st up-sampling layer receives P15In all characteristic patterns, the 1st up-sampling layer it is defeated Outlet exports 1024 breadth degreeAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as S1; The input terminal of 16th neural network block receives S1In all characteristic patterns, the output end output 256 of the 16th neural network block Breadth degree isAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P16;2nd up-sampling The input terminal of layer receives P16In all characteristic patterns, the 2nd up-sampling layer output end export 256 breadth degree beAnd height ForCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as S2;The input terminal of 17th neural network block receives S2In all characteristic patterns, the output end of the 17th neural network block exports 128 breadth degree and isAnd height isFeature The set that all characteristic patterns of output are constituted is denoted as P by figure17;The input terminal of 3rd up-sampling layer receives P17In all spies Sign figure, the output end of the 3rd up-sampling layer export 128 breadth degree and areAnd height isCharacteristic pattern, by all spies of output The set that sign figure is constituted is denoted as S3;The input terminal of 18th neural network block receives S3In all characteristic patterns, the 18th nerve net The output end of network block exports 64 breadth degreeAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted It is denoted as P18;The input terminal of 4th up-sampling layer receives P18In all characteristic patterns, the 4th up-sampling layer output end output 64 The set that all characteristic patterns of output are constituted is denoted as S by the characteristic pattern that breadth degree is W and height is H4;19th neural network The input terminal of block receives S4In all characteristic patterns, it is W and height that the output end of the 19th neural network block, which exports 64 breadth degree, For the characteristic pattern of H, the set that all characteristic patterns of output are constituted is denoted as P19
For output layer, the input terminal of the first convolutional layer receives P19In all characteristic patterns, the first convolutional layer output end output The characteristic pattern that one breadth degree is W and height is H;The input terminal of first normalization layer receives the output end output of the first convolutional layer Characteristic pattern;The input terminal of first active coating receives the characteristic pattern of the output end output of first normalization layer;First active coating Output end export a width training use the corresponding stereo-picture of left view point image Saliency maps picture;Wherein, Saliency maps as Width is W and height is H;
Step 1_3: using the left view point image of the original stereo-picture of every in training set as training left view point image, and Using the depth image of the original stereo-picture of every in training set as trained depth image, it is input to convolutional neural networks In be trained, the Saliency maps picture of every original stereo-picture in training set is obtained, by { In(x, y) } Saliency maps As being denoted asWherein,It indicatesMiddle coordinate position is the pixel of the pixel of (x, y) Value;
Step 1_4: calculate training set in every original stereo-picture Saliency maps picture and true human eye gazing at images it Between loss function value, willWithBetween loss function value be denoted asUsing mean square error loss function It obtains;
Step 1_5: repeating step 1_3 and step 1_4 is V times total, obtains convolutional neural networks training pattern, and N is obtained × V loss function value;Then the smallest loss function value of value is found out from N × V loss function value;Then will be worth the smallest The corresponding weighted vector of loss function value and bias term are to the best initial weights vector sum that should be used as convolutional neural networks training pattern Optimal bias term, correspondence are denoted as WbestAnd bbest;Wherein, V > 1;
The specific steps of the test phase process are as follows:
Step 2_1: it enablesIndicate the stereo-picture that width to be tested is W' and height is H', it will's Left view point image and depth image correspondence are denoted asWithWherein, 1≤x'≤W', 1≤y'≤H',It indicatesMiddle coordinate position is the pixel value of the pixel of (x', y'),It indicatesMiddle coordinate position is the pixel value of the pixel of (x', y'),It indicatesMiddle coordinate bit It is set to the pixel value of the pixel of (x', y');
Step 2_2: willWithIt is input in convolutional neural networks training pattern, and utilizes WbestWith bbestIt is predicted, is obtainedConspicuousness forecast image, be denoted asWherein,It indicatesMiddle coordinate position is the pixel value of the pixel of (x', y').
2. a kind of stereo-picture vision significance detection method based on convolutional neural networks according to claim 1, In step 1_2 described in being characterized in that, the structure of the 1st to the 8th neural network block is identical, empty by set gradually first Hole convolutional layer, second batch normalization layer, the second active coating, the first residual block, the second empty convolutional layer, third batch normalization layer structure At the input terminal of the first empty convolutional layer is the input terminal of the neural network block where it, the input terminal of second batch normalization layer All characteristic patterns of the output end output of the first empty convolutional layer are received, the input terminal of the second active coating receives second batch standardization All characteristic patterns of the output end output of layer, the input terminal of the first residual block receive all of the output end output of the second active coating Characteristic pattern, the input terminal of the second empty convolutional layer receive all characteristic patterns of the output end output of the first residual block, third batch mark The input terminal of standardization layer receives all characteristic patterns of the output end output of the second empty convolutional layer, the output of third batch normalization layer The output end of neural network block of the end where it;Wherein, the 1st and the 5th neural network block respectively in the first cavity volume The convolution kernel number of lamination and the second empty convolutional layer be the 64, the 2nd and the 6th neural network block respectively in the first cavity roll up The convolution kernel number of lamination and the second empty convolutional layer be the 128, the 3rd and the 7th neural network block respectively in it is first empty The convolution kernel number of convolutional layer and the second empty convolutional layer be the 256, the 4th and the 8th neural network block respectively in the first sky The convolution kernel number of hole convolutional layer and the second empty convolutional layer be the 512, the 1st to the 8th neural network block respectively in first It is 1, cavity be 2, filling is 2 that the convolution kernel size of empty convolutional layer and the second empty convolutional layer, which is 3 × 3, stride, 1st to the 8th neural network block respectively in the active mode of the second active coating be " ReLU ";
9th identical with the 10th structure of neural network block, is standardized by the second convolutional layer set gradually and the 4th batch Layer is constituted, and the input terminal of the second convolutional layer is the input terminal of the neural network block where it, the input terminal of the 4th batch of normalization layer All characteristic patterns of the output end output of the second convolutional layer are received, the output end of the 4th batch of normalization layer is the nerve net where it The output end of network block;Wherein, the 9th and the 10th neural network block respectively in the second convolutional layer convolution kernel number be 3, Convolution kernel size be 7 × 7, stride be 1, filling is 3;
11st is identical with the 12nd structure of neural network block, by set gradually third convolutional layer, the 5th batch standardization Layer, third active coating, Volume Four lamination, the 6th batch of normalization layer are constituted, and the input terminal of third convolutional layer is the nerve where it The input terminal of network block, the input terminal of the 5th batch of normalization layer receive all characteristic patterns of the output end output of third convolutional layer, The input terminal of third active coating receives all characteristic patterns of the output end output of the 5th batch of normalization layer, the input of third active coating End receives all characteristic patterns of the output end output of the 5th batch of normalization layer, and the input terminal of Volume Four lamination receives third active coating Output end output all characteristic patterns, the input terminal of the 6th batch of normalization layer receives the institute of the output end output of Volume Four lamination There is characteristic pattern, the output end of the 6th batch of normalization layer is the output end of the neural network block where it;Wherein, the 11st nerve net The convolution kernel number of third convolutional layer and Volume Four lamination in network block is the third convolutional layer in the 64, the 12nd neural network block With the convolution kernel number of Volume Four lamination be the 128, the 11st and the 12nd neural network block respectively in third convolutional layer and the The convolution kernel size of four convolutional layers be 3 × 3, stride be 1, filling is 1;11st and the 12nd neural network block are respectively In third active coating active mode be " ReLU ";
The structure of 13rd to the 19th neural network block is identical, by set gradually the 5th convolutional layer, the 7th batch standardization Layer, the 4th active coating, the 6th convolutional layer, the 8th batch of normalization layer, the 5th active coating, the 7th convolutional layer, the 9th batch of normalization layer It constitutes, the input terminal of the 5th convolutional layer is the input terminal of the neural network block where it, the input termination of the 7th batch of normalization layer All characteristic patterns of the output end output of the 5th convolutional layer are received, the input terminal of the 4th active coating receives the defeated of the 7th batch of normalization layer All characteristic patterns of outlet output, the input terminal of the 6th convolutional layer receive all features of the output end output of the 4th active coating Figure, the input terminal of the 8th batch of normalization layer receive all characteristic patterns of the output end output of the 6th convolutional layer, the 5th active coating Input terminal receives all characteristic patterns of the output end output of the 8th batch of normalization layer, and the input terminal of the 7th convolutional layer receives the 5th and swashs All characteristic patterns of the output end output of layer living, the input terminal of the 9th batch of normalization layer receive the output end output of the 7th convolutional layer All characteristic patterns, the output end of the 9th batch of normalization layer is the output end of the neural network block where it;Wherein, the 13rd mind Convolution kernel number through the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer in network block is the 256, the 14th neural network The convolution kernel number of the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer in block is in the 512, the 15th neural network block The convolution kernel number of 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer is the 5 in the 1024, the 16th neural network block The convolution kernel number of convolutional layer, the 6th convolutional layer and the 7th convolutional layer corresponds in the 512,512,256, the 17th neural network block The 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer convolution kernel number correspond to the 256,256,128, the 18th nerve net The convolution kernel number of the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer in network block corresponds to the 128,128,64, the 19th The convolution kernel number of the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer in neural network block is the 64, the 13rd to the 19th A neural network block respectively in the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer convolution kernel size be 3 × 3, step Width is 1, filling is 1, the 13rd to the 19th neural network block respectively in the 4th active coating and the 5th active coating swash Mode living is " ReLU ".
3. a kind of stereo-picture vision significance detection method based on convolutional neural networks according to claim 2, In step 1_2 described in being characterized in that, the structure of the 1st to the 6th down-sampling block is identical, is made of the second residual block, the The input terminal of two residual blocks is the input terminal of the down-sampling block where it, and the output end of the second residual block is the down-sampling where it The output end of block.
4. a kind of stereo-picture vision significance detection method based on convolutional neural networks according to claim 3, It is characterized in that first residual block is identical with the structure of second residual block comprising 3 convolutional layers, 3 batches of marks Standardization layer and 3 active coatings, the input terminal of the 1st convolutional layer are the input terminal of the residual block where it, the 1st batch of normalization layer Input terminal receive the 1st convolutional layer output end output all characteristic patterns, the input terminal of the 1st active coating receives the 1st All characteristic patterns of the output end output of normalization layer are criticized, the input terminal of the 2nd convolutional layer receives the output end of the 1st active coating All characteristic patterns of output, the input terminal of the 2nd batch of normalization layer receive all features of the output end output of the 2nd convolutional layer Figure, the input terminal of the 2nd active coating receive all characteristic patterns of the output end output of the 2nd batch of normalization layer, the 3rd convolutional layer Input terminal receive the 2nd active coating output end output all characteristic patterns, the input terminal of the 3rd batch normalization layer receives the All characteristic patterns of the output end output of 3 convolutional layers, the received all characteristic patterns of the input terminal of the 1st convolutional layer and the 3rd All characteristic patterns for criticizing the output end output of normalization layer are added, using the output end of the 3rd active coating after the 3rd active coating All characteristic patterns of the output end output of residual block of all characteristic patterns of output as where;Wherein, the 1st and the 5th mind Through network block respectively in the first residual block in each convolutional layer convolution kernel number be the 64, the 2nd and the 6th neural network Block respectively in the first residual block in each convolutional layer convolution kernel number be the 128, the 3rd and the 7th neural network block it is each The convolution kernel number of each convolutional layer in the first residual block in be the 256, the 4th and the 8th neural network block respectively in The first residual block in each convolutional layer convolution kernel number be the 512, the 1st to the 8th neural network block respectively in the It is 1 that the convolution kernel size of the 1st convolutional layer and the 3rd convolutional layer in one residual block, which is 1 × 1, stride, and the 1st to the 8th A neural network block respectively in the first residual block in the 2nd convolutional layer convolution kernel size be 3 × 3, stride be 1, Filling is 1, the 1st and the 4th down-sampling block respectively in the second residual block in the convolution kernel number of each convolutional layer be 64, the 2nd and the 5th down-sampling block respectively in the second residual block in the convolution kernel number of each convolutional layer be the 128, the 3rd A and the 6th down-sampling block respectively in the second residual block in each convolutional layer convolution kernel number be the 256, the 1st to the 6th A down-sampling block respectively in the second residual block in the 1st convolutional layer and the 3rd convolutional layer convolution kernel size be 1 × 1, Stride is 1, the 1st to the 6th down-sampling block respectively in the second residual block in the 2nd convolutional layer convolution kernel size it is equal It is 2 for 3 × 3, stride, filling be the active modes of 1,3 active coatings is " ReLU ".
5. a kind of stereo-picture vision significance based on convolutional neural networks according to any one of claim 1 to 4 Detection method, it is characterised in that in the step 1_2, the size of the pond window of the 1st to the 4th maximum pond layer is 2 × 2, stride is 2.
6. a kind of stereo-picture vision significance detection method based on convolutional neural networks according to claim 5, In step 1_2 described in being characterized in that, the sampling configuration of the 1st to the 4th up-sampling layer is bilinear interpolation, scale factor It is 2.
CN201910327556.4A 2019-04-23 2019-04-23 Stereo image visual saliency detection method based on convolutional neural network Active CN110175986B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910327556.4A CN110175986B (en) 2019-04-23 2019-04-23 Stereo image visual saliency detection method based on convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910327556.4A CN110175986B (en) 2019-04-23 2019-04-23 Stereo image visual saliency detection method based on convolutional neural network

Publications (2)

Publication Number Publication Date
CN110175986A true CN110175986A (en) 2019-08-27
CN110175986B CN110175986B (en) 2021-01-08

Family

ID=67689881

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910327556.4A Active CN110175986B (en) 2019-04-23 2019-04-23 Stereo image visual saliency detection method based on convolutional neural network

Country Status (1)

Country Link
CN (1) CN110175986B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110555434A (en) * 2019-09-03 2019-12-10 浙江科技学院 method for detecting visual saliency of three-dimensional image through local contrast and global guidance
CN110782458A (en) * 2019-10-23 2020-02-11 浙江科技学院 Object image 3D semantic prediction segmentation method of asymmetric coding network
CN111369506A (en) * 2020-02-26 2020-07-03 四川大学 Lens turbidity grading method based on eye B-ultrasonic image
CN111582316A (en) * 2020-04-10 2020-08-25 天津大学 RGB-D significance target detection method
CN111612832A (en) * 2020-04-29 2020-09-01 杭州电子科技大学 Method for improving depth estimation accuracy by utilizing multitask complementation
CN112528899A (en) * 2020-12-17 2021-03-19 南开大学 Image salient object detection method and system based on implicit depth information recovery
CN112528900A (en) * 2020-12-17 2021-03-19 南开大学 Image salient object detection method and system based on extreme down-sampling
WO2021096806A1 (en) * 2019-11-14 2021-05-20 Zoox, Inc Depth data model training with upsampling, losses, and loss balancing
CN113192073A (en) * 2021-04-06 2021-07-30 浙江科技学院 Clothing semantic segmentation method based on cross fusion network
US11157774B2 (en) * 2019-11-14 2021-10-26 Zoox, Inc. Depth data model training with upsampling, losses, and loss balancing
CN113592795A (en) * 2021-07-19 2021-11-02 深圳大学 Visual saliency detection method of stereoscopic image, thumbnail generation method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106462771A (en) * 2016-08-05 2017-02-22 深圳大学 3D image significance detection method
CN106778687A (en) * 2017-01-16 2017-05-31 大连理工大学 Method for viewing points detecting based on local evaluation and global optimization
US20170351941A1 (en) * 2016-06-03 2017-12-07 Miovision Technologies Incorporated System and Method for Performing Saliency Detection Using Deep Active Contours
CN109146944A (en) * 2018-10-30 2019-01-04 浙江科技学院 A kind of space or depth perception estimation method based on the revoluble long-pending neural network of depth
CN109376611A (en) * 2018-09-27 2019-02-22 方玉明 A kind of saliency detection method based on 3D convolutional neural networks
CN109598268A (en) * 2018-11-23 2019-04-09 安徽大学 A kind of RGB-D well-marked target detection method based on single flow depth degree network
CN109635822A (en) * 2018-12-07 2019-04-16 浙江科技学院 The significant extracting method of stereo-picture vision based on deep learning coding and decoding network

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170351941A1 (en) * 2016-06-03 2017-12-07 Miovision Technologies Incorporated System and Method for Performing Saliency Detection Using Deep Active Contours
CN106462771A (en) * 2016-08-05 2017-02-22 深圳大学 3D image significance detection method
CN106778687A (en) * 2017-01-16 2017-05-31 大连理工大学 Method for viewing points detecting based on local evaluation and global optimization
CN109376611A (en) * 2018-09-27 2019-02-22 方玉明 A kind of saliency detection method based on 3D convolutional neural networks
CN109146944A (en) * 2018-10-30 2019-01-04 浙江科技学院 A kind of space or depth perception estimation method based on the revoluble long-pending neural network of depth
CN109598268A (en) * 2018-11-23 2019-04-09 安徽大学 A kind of RGB-D well-marked target detection method based on single flow depth degree network
CN109635822A (en) * 2018-12-07 2019-04-16 浙江科技学院 The significant extracting method of stereo-picture vision based on deep learning coding and decoding network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHEN, HAO 等: "RGB-D Saliency Detection by Multi-stream Late Fusion Network", 《COMPUTER VISION SYSTEMS》 *
XINGYU CAI 等: "Saliency detection for stereoscopic 3D images in the quaternion frequency domain", 《3D RESEARCH》 *
李荣 等: "利用卷积神经网络的显著性区域预测方法", 《重庆邮电大学学报( 自然科学版)》 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110555434A (en) * 2019-09-03 2019-12-10 浙江科技学院 method for detecting visual saliency of three-dimensional image through local contrast and global guidance
CN110555434B (en) * 2019-09-03 2022-03-29 浙江科技学院 Method for detecting visual saliency of three-dimensional image through local contrast and global guidance
CN110782458A (en) * 2019-10-23 2020-02-11 浙江科技学院 Object image 3D semantic prediction segmentation method of asymmetric coding network
CN110782458B (en) * 2019-10-23 2022-05-31 浙江科技学院 Object image 3D semantic prediction segmentation method of asymmetric coding network
US11681046B2 (en) 2019-11-14 2023-06-20 Zoox, Inc. Depth data model training with upsampling, losses and loss balancing
WO2021096806A1 (en) * 2019-11-14 2021-05-20 Zoox, Inc Depth data model training with upsampling, losses, and loss balancing
US11157774B2 (en) * 2019-11-14 2021-10-26 Zoox, Inc. Depth data model training with upsampling, losses, and loss balancing
CN111369506A (en) * 2020-02-26 2020-07-03 四川大学 Lens turbidity grading method based on eye B-ultrasonic image
CN111582316A (en) * 2020-04-10 2020-08-25 天津大学 RGB-D significance target detection method
CN111582316B (en) * 2020-04-10 2022-06-28 天津大学 RGB-D significance target detection method
CN111612832A (en) * 2020-04-29 2020-09-01 杭州电子科技大学 Method for improving depth estimation accuracy by utilizing multitask complementation
CN111612832B (en) * 2020-04-29 2023-04-18 杭州电子科技大学 Method for improving depth estimation accuracy by utilizing multitask complementation
CN112528899B (en) * 2020-12-17 2022-04-12 南开大学 Image salient object detection method and system based on implicit depth information recovery
CN112528900B (en) * 2020-12-17 2022-09-16 南开大学 Image salient object detection method and system based on extreme down-sampling
CN112528900A (en) * 2020-12-17 2021-03-19 南开大学 Image salient object detection method and system based on extreme down-sampling
CN112528899A (en) * 2020-12-17 2021-03-19 南开大学 Image salient object detection method and system based on implicit depth information recovery
CN113192073A (en) * 2021-04-06 2021-07-30 浙江科技学院 Clothing semantic segmentation method based on cross fusion network
CN113592795A (en) * 2021-07-19 2021-11-02 深圳大学 Visual saliency detection method of stereoscopic image, thumbnail generation method and device
CN113592795B (en) * 2021-07-19 2024-04-12 深圳大学 Visual saliency detection method for stereoscopic image, thumbnail generation method and device

Also Published As

Publication number Publication date
CN110175986B (en) 2021-01-08

Similar Documents

Publication Publication Date Title
CN110175986A (en) A kind of stereo-picture vision significance detection method based on convolutional neural networks
CN110555434B (en) Method for detecting visual saliency of three-dimensional image through local contrast and global guidance
CN107644415B (en) A kind of text image method for evaluating quality and equipment
CN109410261B (en) Monocular image depth estimation method based on pyramid pooling module
CN109558832A (en) A kind of human body attitude detection method, device, equipment and storage medium
CN110210492A (en) A kind of stereo-picture vision significance detection method based on deep learning
CN111275518A (en) Video virtual fitting method and device based on mixed optical flow
CN110263813A (en) A kind of conspicuousness detection method merged based on residual error network and depth information
CN110059728A (en) RGB-D image vision conspicuousness detection method based on attention model
CN103824272B (en) The face super-resolution reconstruction method heavily identified based on k nearest neighbor
CN110211061A (en) List depth camera depth map real time enhancing method and device neural network based
CN112784736B (en) Character interaction behavior recognition method based on multi-modal feature fusion
CN109584290A (en) A kind of three-dimensional image matching method based on convolutional neural networks
CN110246148A (en) The conspicuousness detection method of multi-modal depth information fusion and attention study
CN108416266A (en) A kind of video behavior method for quickly identifying extracting moving target using light stream
CN108389192A (en) Stereo-picture Comfort Evaluation method based on convolutional neural networks
CN109978786A (en) A kind of Kinect depth map restorative procedure based on convolutional neural networks
CN109461177B (en) Monocular image depth prediction method based on neural network
CN109146944A (en) A kind of space or depth perception estimation method based on the revoluble long-pending neural network of depth
CN110263768A (en) A kind of face identification method based on depth residual error network
CN110705566B (en) Multi-mode fusion significance detection method based on spatial pyramid pool
CN113298736B (en) Face image restoration method based on face pattern
CN113449691A (en) Human shape recognition system and method based on non-local attention mechanism
CN112991371B (en) Automatic image coloring method and system based on coloring overflow constraint
CN110852935A (en) Image processing method for human face image changing with age

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant