CN110175986A - A kind of stereo-picture vision significance detection method based on convolutional neural networks - Google Patents
A kind of stereo-picture vision significance detection method based on convolutional neural networks Download PDFInfo
- Publication number
- CN110175986A CN110175986A CN201910327556.4A CN201910327556A CN110175986A CN 110175986 A CN110175986 A CN 110175986A CN 201910327556 A CN201910327556 A CN 201910327556A CN 110175986 A CN110175986 A CN 110175986A
- Authority
- CN
- China
- Prior art keywords
- layer
- output
- characteristic patterns
- input terminal
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Quality & Reliability (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of stereo-picture vision significance detection method based on convolutional neural networks, it constructs convolutional neural networks, include input layer, hidden layer, output layer, input layer includes RGB figure input layer and depth map input layer, hidden layer includes coding framework and decoding frame, and coding framework is made of RGB feature extraction module, depth characteristic extraction module and Fusion Features module;The left view point image of every width stereo-picture in training set and depth image are input in convolutional neural networks and are trained, the Saliency maps picture of every width stereo-picture in training set is obtained;The loss function value between the Saliency maps picture and true human eye gazing at images of every width stereo-picture in training set is calculated, obtains convolutional neural networks training pattern after repeating repeatedly;The left view point image and depth image of stereo-picture to be tested are input in convolutional neural networks training pattern, and prediction obtains conspicuousness forecast image;Advantage is its vision significance detection accuracy with higher.
Description
Technical field
The present invention relates to a kind of vision significance detection techniques, more particularly, to a kind of solid based on convolutional neural networks
Image vision conspicuousness detection method.
Background technique
Vision significance is the popular research of the multiple fields such as Neuscience, robot technology, computer vision in recent years
Project.Research about vision significance detection can be divided into two major classes: eyeball fixes prediction and conspicuousness target detection.The former
It is several blinkpunkts for predicting people when watching natural scene, the latter is accurately to extract interested object.In general, vision
Conspicuousness detection algorithm can be divided into top-down and bottom-up two class.Top-down method is task-driven, is needed
Supervised learning.And bottom-up method is usually using low layer clue, as color characteristic, distance feature and heuristic conspicuousness are special
Sign.Most common heuristic significant characteristics first is that contrast, such as based on pixel or block-based contrast.In the past to view
Feel that the research of conspicuousness detection has focused largely on two dimensional image.However, it was found that firstly, three-dimensional data replaces 2-D data more suitable
Close practical application;Secondly, extracting object outstanding merely with 2-D data is not as visual scene becomes to become increasingly complex
No more.In recent years, with the three-dimensional datas acquiring technology such as Time-of-Flight sensor and Microsoft Kinect
Progress, pushed the use of structure FInite Element, improved the recognition capability between the similar different objects of appearance.Depth number
It is unrelated with light according to being easy to capture, geometry clue can also be provided, vision significance prediction is improved.Due to RGB data and depth
The complementarity of data proposes the method that RGB image and depth image pair-wise combination are largely used for vision significance detection.It
Preceding work is concentrated mainly on using the specific priori knowledge in field the significant characteristics for constructing low level, such as mankind's tendency
In being more concerned about closer object, however this observation is difficult to be generalized to all scenes.In previous most of work, multimode
It is all by the channel direct sequence RGB-D solve or every kind of mode of independent process that state, which merges problem, then in conjunction with two
The decision of kind mode.Although these strategies achieve very big improvement, they are difficult sufficiently to explore across pattern complementary.In recent years
Come, as convolutional neural networks (CNNs) are in the success of study RGB data differentiation characteristic aspect, more and more work are utilized
CNNs explores more powerful effective multi-modal combined RGB-D and indicates.These work are mostly based on the architecture of two streams, wherein
RGB data and depth data learn in an independent bottom-up stream, and are joined in early stage or later period binding characteristic
Close reasoning.As most popular solution, double-current framework is realized than the work based on manual RGB-D characteristic significantly to be changed
Into however, there are the most critical issues: how effectively to utilize the multi-modal complementary information during bottom-up.Therefore, having must
RGB-D image vision conspicuousness detection technique is further studied, to improve the accuracy of vision significance detection.
Summary of the invention
Technical problem to be solved by the invention is to provide a kind of, and the stereo-picture vision based on convolutional neural networks is significant
Property detection method, vision significance detection accuracy with higher.
The technical scheme of the invention to solve the technical problem is: a kind of perspective view based on convolutional neural networks
As vision significance detection method, it is characterised in that including two processes of training stage and test phase;
The specific steps of the training stage process are as follows:
Step 1_1: the original stereo-picture that N breadth degree is W and height is H is chosen;Then by all original of selection
Stereo-picture and the respective left view point image of all original stereo-pictures, depth image and true human eye gazing at images constitute
N-th original stereo-picture in training set is denoted as { I by training setn(x, y) }, by { In(x, y) } left view point image, depth
Degree image and true human eye gazing at images correspondence are denoted as{Dn(x,y)}、Wherein, N is positive whole
Number, N >=300, W and H can be divided exactly by 2, and n is positive integer, and the initial value of n is 1,1≤n≤N, 1≤x≤W, 1≤y≤H, In
(x, y) indicates { In(x, y) } in coordinate position be (x, y) pixel pixel value,It indicatesIn
Coordinate position is the pixel value of the pixel of (x, y), Dn(x, y) indicates { Dn(x, y) } in coordinate position be (x, y) pixel
Pixel value,It indicatesMiddle coordinate position is the pixel value of the pixel of (x, y);
Step 1_2: building convolutional neural networks: the convolutional neural networks include input layer, hidden layer, output layer, input layer
Including RGB figure input layer and depth map input layer, hidden layer includes coding framework and decoding frame, and coding framework is mentioned by RGB feature
Modulus block, depth characteristic extraction module and Fusion Features module three parts composition, RGB feature extraction module is by the 1st to the 4th
Neural network block, the 1st to the 3rd down-sampling block composition, depth characteristic extraction module by the 5th to the 8th neural network block,
4th to the 6th down-sampling block composition, Fusion Features module by the 9th to the 15th neural network block, the 1st to the 4th most
Great Chiization layer composition, decoding frame are made of the 16th to the 19th neural network block, the 1st to the 4th up-sampling layer;Output
Layer is made of the first convolutional layer, first normalization layer and the first active coating, and the convolution kernel size of the first convolutional layer is 3 × 3, walks
Width size is 1, convolution kernel number is 1, is filled with 1, and the active mode of the first active coating is " Sigmoid ";
For RGB figure input layer, input terminal receives width training left view point image, and output end output is trained left
Visual point image is to hidden layer;Wherein, it is desirable that training is W with the width of left view point image and height is H;
For depth map input layer, input terminal receives the received training left view point diagram of input terminal of RGB figure input layer
As corresponding trained depth image, output end exports trained depth image to hidden layer;Wherein, trained depth image
Width is W and height is H;
For RGB feature extraction module, the output end that the input terminal of the 1st neural network block receives RGB figure input layer is defeated
The left view point image of training out, the output end of the 1st neural network block export the characteristic pattern that 64 breadth degree are W and height is H,
The set that all characteristic patterns of output are constituted is denoted as P1;The input terminal of 1st down-sampling block receives P1In all characteristic patterns,
The output end of 1st down-sampling block exports 64 breadth degreeAnd height isCharacteristic pattern, by all characteristic patterns of output
The set of composition is denoted as X1;The input terminal of 2nd neural network block receives X1In all characteristic patterns, the 2nd neural network block
Output end exports 128 breadth degreeAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as
P2;The input terminal of 2nd down-sampling block receives P2In all characteristic patterns, the output end of the 2nd down-sampling block exports 128 breadth
Degree isAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as X2;3rd neural network block
Input terminal receive X2In all characteristic patterns, the output end of the 3rd neural network block exports 256 breadth degree and isAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P3;The input terminal of 3rd down-sampling block receives P3In
All characteristic patterns, the output end of the 3rd down-sampling block export 256 breadth degree and areAnd height isCharacteristic pattern, by output
The set that all characteristic patterns are constituted is denoted as X3;The input terminal of 4th neural network block receives X3In all characteristic patterns, the 4th mind
Output end through network block exports 512 breadth degreeAnd height isCharacteristic pattern, all characteristic patterns of output are constituted
Set is denoted as P4;
For depth characteristic extraction module, the input terminal of the 5th neural network block receives the output end of depth map input layer
The training depth image of output, the output end of the 5th neural network block export the characteristic pattern that 64 breadth degree are W and height is H,
The set that all characteristic patterns of output are constituted is denoted as P5;The input terminal of 4th down-sampling block receives P5In all characteristic patterns,
The output end of 4th down-sampling block exports 64 breadth degreeAnd height isCharacteristic pattern, by all characteristic pattern structures of output
At set be denoted as X4;The input terminal of 6th neural network block receives X4In all characteristic patterns, the 6th neural network block it is defeated
Outlet exports 128 breadth degreeAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P6;
The input terminal of 5th down-sampling block receives P6In all characteristic patterns, the output end of the 5th down-sampling block exports 128 breadth degree
ForAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as X5;7th neural network block
Input terminal receives X5In all characteristic patterns, the output end of the 7th neural network block exports 256 breadth degree and isAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P7;The input terminal of 6th down-sampling block receives P7In
All characteristic patterns, the output end of the 6th down-sampling block export 256 breadth degree and areAnd height isCharacteristic pattern, will export
All characteristic patterns constitute set be denoted as X6;The input terminal of 8th neural network block receives X6In all characteristic patterns, the 8th
The output end of neural network block exports 512 breadth degreeAnd height isCharacteristic pattern, all characteristic patterns of output are constituted
Set be denoted as P8;
For Fusion Features module, the input terminal of the 9th neural network block receives the output end output of RGB figure input layer
Training left view point image, the output end of the 9th neural network block export the characteristic pattern that 3 breadth degree are W and height is H, will be defeated
The set that all characteristic patterns out are constituted is denoted as P9;The input terminal of 10th neural network block receives the output of depth map input layer
The training depth image of output is held, the output end of the 10th neural network block exports the feature that 3 breadth degree are W and height is H
The set that all characteristic patterns of output are constituted is denoted as P by figure10;To P9In all characteristic patterns and P10In all characteristic patterns into
Row Element-wise Summation operation, it is W and height that 3 breadth degree are exported after Element-wise Summation operation
For the characteristic pattern of H, the set that all characteristic patterns of output are constituted is denoted as E1;The input terminal of 11st neural network block receives E1
In all characteristic patterns, the output end of the 11st neural network block exports the characteristic pattern that 64 breadth degree are W and height is H, will be defeated
The set that all characteristic patterns out are constituted is denoted as P11;To P1In all characteristic patterns, P5In all characteristic patterns and P11In institute
There is characteristic pattern to carry out Element-wise Summation operation, exports 64 breadth after Element-wise Summation operation
The set that all characteristic patterns of output are constituted is denoted as E by the characteristic pattern that degree is W and height is H2;1st maximum pond layer it is defeated
Enter end and receives E2In all characteristic patterns, the output end 64 breadth degree of output of the 1st maximum pond layer areAnd height is's
The set that all characteristic patterns of output are constituted is denoted as Z by characteristic pattern1;The input terminal of 12nd neural network block receives Z1In institute
There is characteristic pattern, the output end of the 12nd neural network block exports 128 breadth degree and isAnd height isCharacteristic pattern, will export
All characteristic patterns constitute set be denoted as P12;To P2In all characteristic patterns, P6In all characteristic patterns and P12In it is all
Characteristic pattern carries out Element-wise Summation operation, exports 128 breadth after Element-wise Summation operation
Degree isAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as E3;2nd maximum pond layer
Input terminal receive E3In all characteristic patterns, the output end 128 breadth degree of output of the 2nd maximum pond layer areAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as Z2;The input terminal of 13rd neural network block receives Z2
In all characteristic patterns, the output end of the 13rd neural network block exports 256 breadth degree and isAnd height isCharacteristic pattern,
The set that all characteristic patterns of output are constituted is denoted as P13;To P3In all characteristic patterns, P7In all characteristic patterns and P13In
All characteristic patterns carry out Element-wise Summation operation, Element-wise Summation operation after export
256 breadth degree areAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as E4;3rd maximum
The input terminal of pond layer receives E4In all characteristic patterns, the output end 256 breadth degree of output of the 3rd maximum pond layer areAnd
Highly it isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as Z3;The input terminal of 14th neural network block
Receive Z3In all characteristic patterns, the output end of the 14th neural network block exports 512 breadth degree and isAnd height isSpy
The set that all characteristic patterns of output are constituted is denoted as P by sign figure14;To P4In all characteristic patterns, P8In all characteristic patterns and
P14In all characteristic patterns carry out Element-wise Summation operation, Element-wise Summation operation after it is defeated
512 breadth degree are outAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as E5;4th most
The input terminal of great Chiization layer receives E5In all characteristic patterns, the output end 512 breadth degree of output of the 4th maximum pond layer are
And height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as Z4;The input of 15th neural network block
End receives Z4In all characteristic patterns, the output end of the 15th neural network block exports 1024 breadth degree and isAnd height is
Characteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P15;
For decoding frame, the input terminal of the 1st up-sampling layer receives P15In all characteristic patterns, the 1st up-sampling layer
Output end export 1024 breadth degree beAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted remembers
For S1;The input terminal of 16th neural network block receives S1In all characteristic patterns, the output end output of the 16th neural network block
256 breadth degree areAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P16;On 2nd
The input terminal of sample level receives P16In all characteristic patterns, the 2nd up-sampling layer output end export 256 breadth degree beAnd
Highly it isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as S2;The input terminal of 17th neural network block
Receive S2In all characteristic patterns, the output end of the 17th neural network block exports 128 breadth degree and isAnd height isSpy
The set that all characteristic patterns of output are constituted is denoted as P by sign figure17;The input terminal of 3rd up-sampling layer receives P17In it is all
Characteristic pattern, the output end of the 3rd up-sampling layer export 128 breadth degree and areAnd height isCharacteristic pattern, by the institute of output
The set for having characteristic pattern to constitute is denoted as S3;The input terminal of 18th neural network block receives S3In all characteristic patterns, the 18th mind
Output end through network block exports 64 breadth degreeAnd height isCharacteristic pattern, all characteristic patterns of output are constituted
Set is denoted as P18;The input terminal of 4th up-sampling layer receives P18In all characteristic patterns, the 4th up-sampling layer output end it is defeated
The set that all characteristic patterns of output are constituted is denoted as S by the characteristic pattern that 64 breadth degree are W and height is H out4;19th nerve
The input terminal of network block receives S4In all characteristic patterns, the output end of the 19th neural network block export 64 breadth degree be W and
Height is the characteristic pattern of H, and the set that all characteristic patterns of output are constituted is denoted as P19;
For output layer, the input terminal of the first convolutional layer receives P19In all characteristic patterns, the output end of the first convolutional layer
Export the characteristic pattern that a breadth degree is W and height is H;The input terminal of first normalization layer receives the output end of the first convolutional layer
The characteristic pattern of output;The input terminal of first active coating receives the characteristic pattern of the output end output of first normalization layer;First swashs
The output end of layer living exports the Saliency maps picture that the corresponding stereo-picture of left view point image is used in width training;Wherein, Saliency maps
The width of picture is W and height is H;
Step 1_3: using the left view point image of the original stereo-picture of every in training set as training left view point diagram
Picture, and using the depth image of the original stereo-picture of every in training set as trained depth image, it is input to convolution mind
It is trained in network, the Saliency maps picture of every original stereo-picture in training set is obtained, by { In(x, y) } it is aobvious
Work property image is denoted asWherein,It indicatesMiddle coordinate position is the pixel of (x, y)
Pixel value;
Step 1_4: the Saliency maps picture and true human eye for calculating every original stereo-picture in training set watch figure attentively
Loss function value as between, willWithBetween loss function value be denoted asUsing mean square error loss function
It obtains;
Step 1_5: repeating step 1_3 and step 1_4 is V times total, obtains convolutional neural networks training pattern, and there are
To N × V loss function value;Then the smallest loss function value of value is found out from N × V loss function value;Then will be worth most
The small corresponding weighted vector of loss function value and bias term swears the best initial weights that should be used as convolutional neural networks training pattern
Amount and optimal bias term, correspondence are denoted as WbestAnd bbest;Wherein, V > 1;
The specific steps of the test phase process are as follows:
Step 2_1: it enablesIndicate the stereo-picture that width to be tested is W' and height is H', it willLeft view point image and depth image correspondence be denoted asWithWherein, 1≤x'≤
W', 1≤y'≤H',It indicatesMiddle coordinate position is the pixel value of the pixel of (x', y'),It indicatesMiddle coordinate position is the pixel value of the pixel of (x', y'),It indicatesMiddle coordinate position is the pixel value of the pixel of (x', y');
Step 2_2: willWithIt is input in convolutional neural networks training pattern, and utilizes
WbestAnd bbestIt is predicted, is obtainedConspicuousness forecast image, be denoted asWherein,It indicatesMiddle coordinate position is the pixel value of the pixel of (x', y').
In the step 1_2, the structure of the 1st to the 8th neural network block is identical, empty by set gradually first
Hole convolutional layer, second batch normalization layer, the second active coating, the first residual block, the second empty convolutional layer, third batch normalization layer structure
At the input terminal of the first empty convolutional layer is the input terminal of the neural network block where it, the input terminal of second batch normalization layer
All characteristic patterns of the output end output of the first empty convolutional layer are received, the input terminal of the second active coating receives second batch standardization
All characteristic patterns of the output end output of layer, the input terminal of the first residual block receive all of the output end output of the second active coating
Characteristic pattern, the input terminal of the second empty convolutional layer receive all characteristic patterns of the output end output of the first residual block, third batch mark
The input terminal of standardization layer receives all characteristic patterns of the output end output of the second empty convolutional layer, the output of third batch normalization layer
The output end of neural network block of the end where it;Wherein, the 1st and the 5th neural network block respectively in the first cavity volume
The convolution kernel number of lamination and the second empty convolutional layer be the 64, the 2nd and the 6th neural network block respectively in the first cavity roll up
The convolution kernel number of lamination and the second empty convolutional layer be the 128, the 3rd and the 7th neural network block respectively in it is first empty
The convolution kernel number of convolutional layer and the second empty convolutional layer be the 256, the 4th and the 8th neural network block respectively in the first sky
The convolution kernel number of hole convolutional layer and the second empty convolutional layer be the 512, the 1st to the 8th neural network block respectively in first
It is 1, cavity be 2, filling is 2 that the convolution kernel size of empty convolutional layer and the second empty convolutional layer, which is 3 × 3, stride,
1st to the 8th neural network block respectively in the active mode of the second active coating be " ReLU ";
9th identical with the 10th structure of neural network block, is marked by the second convolutional layer set gradually and the 4th batch
Standardization layer is constituted, and the input terminal of the second convolutional layer is the input terminal of the neural network block where it, the 4th batch of normalization layer it is defeated
Enter all characteristic patterns that end receives the output end output of the second convolutional layer, the output end of the 4th batch of normalization layer is the mind where it
Output end through network block;Wherein, the 9th and the 10th neural network block respectively in the second convolutional layer convolution kernel number it is equal
For 3, convolution kernel size be 7 × 7, stride be 1, filling is 3;
11st identical with the 12nd structure of neural network block, by third convolutional layer, the 5th batch of mark set gradually
Standardization layer, third active coating, Volume Four lamination, the 6th batch of normalization layer are constituted, and the input terminal of third convolutional layer is where it
The input terminal of neural network block, the input terminal of the 5th batch of normalization layer receive all features of the output end output of third convolutional layer
Figure, the input terminal of third active coating receive all characteristic patterns of the output end output of the 5th batch of normalization layer, third active coating
Input terminal receives all characteristic patterns of the output end output of the 5th batch of normalization layer, and the input terminal of Volume Four lamination receives third and swashs
All characteristic patterns of the output end output of layer living, the input terminal of the 6th batch of normalization layer receive the output end output of Volume Four lamination
All characteristic patterns, the output end of the 6th batch of normalization layer is the output end of the neural network block where it;Wherein, the 11st mind
Convolution kernel number through third convolutional layer and Volume Four lamination in network block is the third volume in the 64, the 12nd neural network block
The convolution kernel number of lamination and Volume Four lamination is 128, the 11st and the 12nd neural network block respectively in third convolutional layer
Convolution kernel size with Volume Four lamination be 3 × 3, stride be 1, filling is 1;11st and the 12nd neural network block
The active mode of third active coating in respectively is " ReLU ";
The structure of 13rd to the 19th neural network block is identical, by the 5th convolutional layer, the 7th batch of mark set gradually
Standardization layer, the 4th active coating, the 6th convolutional layer, the 8th batch of normalization layer, the 5th active coating, the 7th convolutional layer, the 9th batch of standard
Change layer to constitute, the input terminal of the 5th convolutional layer is the input terminal of the neural network block where it, the input of the 7th batch of normalization layer
End receives all characteristic patterns of the output end output of the 5th convolutional layer, and the input terminal of the 4th active coating receives the 7th batch of normalization layer
Output end output all characteristic patterns, the input terminal of the 6th convolutional layer receives all spies of the output end output of the 4th active coating
Sign figure, the input terminal of the 8th batch of normalization layer receive all characteristic patterns of the output end output of the 6th convolutional layer, the 5th active coating
Input terminal receive the 8th batch of normalization layer output end output all characteristic patterns, the input terminal of the 7th convolutional layer receives the 5th
All characteristic patterns of the output end output of active coating, the output end that the input terminal of the 9th batch of normalization layer receives the 7th convolutional layer are defeated
All characteristic patterns out, the output end of the 9th batch of normalization layer are the output end of the neural network block where it;Wherein, the 13rd
The convolution kernel number of the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer in neural network block is the 256, the 14th nerve net
The convolution kernel number of the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer in network block is in the 512, the 15th neural network block
The 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer convolution kernel number be in the 1024, the 16th neural network block the
The convolution kernel number of five convolutional layers, the 6th convolutional layer and the 7th convolutional layer corresponds to the 512,512,256, the 17th neural network block
In the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer convolution kernel number correspond to the 256,256,128, the 18th nerve
The convolution kernel number of the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer in network block corresponds to the 128,128,64, the 19th
The convolution kernel number of the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer in a neural network block is the 64, the 13rd to the
19 neural network blocks respectively in the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer convolution kernel size be 3 × 3,
Stride is 1, filling is 1, the 13rd to the 19th neural network block respectively in the 4th active coating and the 5th active coating
Active mode is " ReLU ".
In the step 1_2, the structure of the 1st to the 6th down-sampling block is identical, is made of the second residual block, the
The input terminal of two residual blocks is the input terminal of the down-sampling block where it, and the output end of the second residual block is the down-sampling where it
The output end of block.
First residual block is identical with the structure of second residual block comprising 3 convolutional layers, 3 batches of marks
Standardization layer and 3 active coatings, the input terminal of the 1st convolutional layer are the input terminal of the residual block where it, the 1st batch of normalization layer
Input terminal receive the 1st convolutional layer output end output all characteristic patterns, the input terminal of the 1st active coating receives the 1st
All characteristic patterns of the output end output of normalization layer are criticized, the input terminal of the 2nd convolutional layer receives the output end of the 1st active coating
All characteristic patterns of output, the input terminal of the 2nd batch of normalization layer receive all features of the output end output of the 2nd convolutional layer
Figure, the input terminal of the 2nd active coating receive all characteristic patterns of the output end output of the 2nd batch of normalization layer, the 3rd convolutional layer
Input terminal receive the 2nd active coating output end output all characteristic patterns, the input terminal of the 3rd batch normalization layer receives the
All characteristic patterns of the output end output of 3 convolutional layers, the received all characteristic patterns of the input terminal of the 1st convolutional layer and the 3rd
All characteristic patterns for criticizing the output end output of normalization layer are added, using the output end of the 3rd active coating after the 3rd active coating
All characteristic patterns of the output end output of residual block of all characteristic patterns of output as where;Wherein, the 1st and the 5th mind
Through network block respectively in the first residual block in each convolutional layer convolution kernel number be the 64, the 2nd and the 6th neural network
Block respectively in the first residual block in each convolutional layer convolution kernel number be the 128, the 3rd and the 7th neural network block it is each
The convolution kernel number of each convolutional layer in the first residual block in be the 256, the 4th and the 8th neural network block respectively in
The first residual block in each convolutional layer convolution kernel number be the 512, the 1st to the 8th neural network block respectively in the
It is 1 that the convolution kernel size of the 1st convolutional layer and the 3rd convolutional layer in one residual block, which is 1 × 1, stride, and the 1st to the 8th
A neural network block respectively in the first residual block in the 2nd convolutional layer convolution kernel size be 3 × 3, stride be 1,
Filling is 1, the 1st and the 4th down-sampling block respectively in the second residual block in the convolution kernel number of each convolutional layer be
64, the 2nd and the 5th down-sampling block respectively in the second residual block in the convolution kernel number of each convolutional layer be the 128, the 3rd
A and the 6th down-sampling block respectively in the second residual block in each convolutional layer convolution kernel number be the 256, the 1st to the 6th
A down-sampling block respectively in the second residual block in the 1st convolutional layer and the 3rd convolutional layer convolution kernel size be 1 × 1,
Stride is 1, the 1st to the 6th down-sampling block respectively in the second residual block in the 2nd convolutional layer convolution kernel size it is equal
It is 2 for 3 × 3, stride, filling be the active modes of 1,3 active coatings is " ReLU ".
In the step 1_2, the size of the pond window of the 1st to the 4th maximum pond layer is that 2 × 2, stride is equal
It is 2.
In the step 1_2, the sampling configuration of the 1st to the 4th up-sampling layer is bilinear interpolation, scale factor
It is 2.
Compared with the prior art, the advantages of the present invention are as follows:
1) coding framework proposed in convolutional neural networks of the method for the present invention by building is to RGB image and depth image
Module (i.e. RGB feature extraction module and depth characteristic extraction module) is respectively trained to learn the RGB and depth of different stage
Feature is spent, and proposes the module of special fusion a RGB and depth characteristic, i.e. Fusion Features module, from rudimentary to advanced
Both features are merged, is conducive to make full use of and forms new differentiation feature across modal information, improve stereoscopic vision conspicuousness
The accuracy of prediction.
2) in the RGB feature extraction module and depth characteristic extraction module in the convolutional neural networks of the method for the present invention building
Down-sampling block using stride by 2 residual block come replace in the past work used in maximum pond layer, it is adaptive to be conducive to model
Ground selected characteristic information is answered, avoids and loses important information since maximum pondization operates.
3) in the RGB feature extraction module and depth characteristic extraction module in the convolutional neural networks of the method for the present invention building
The residual block that front and back belt has empty convolutional layer is introduced, the acceptance region of convolution kernel is expanded, is conducive to the convolutional Neural net of building
Network is more concerned about global information, acquires more abundant content.
Detailed description of the invention
Fig. 1 is the composition schematic diagram of the convolutional neural networks of the method for the present invention building.
Specific embodiment
The present invention will be described in further detail below with reference to the embodiments of the drawings.
A kind of stereo-picture vision significance detection method based on convolutional neural networks proposed by the present invention comprising instruction
Practice two processes of stage and test phase.
The specific steps of the training stage process are as follows:
Step 1_1: the original stereo-picture that N breadth degree is W and height is H is chosen;Then by all original of selection
Stereo-picture and the respective left view point image of all original stereo-pictures, depth image and true human eye gazing at images constitute
N-th original stereo-picture in training set is denoted as { I by training setn(x, y) }, by { In(x, y) } left view point image, depth
Degree image and true human eye gazing at images correspondence are denoted as{Dn(x,y)}、Wherein, N is positive whole
Number, N >=300 such as take N=600, W and H that can be divided exactly by 2, and n is positive integer, and the initial value of n is 1, and 1≤n≤N, 1≤x≤
W, 1≤y≤H, In(x, y) indicates { In(x, y) } in coordinate position be (x, y) pixel pixel value,It indicatesMiddle coordinate position is the pixel value of the pixel of (x, y), Dn(x, y) indicates { Dn(x, y) } in coordinate position be
The pixel value of the pixel of (x, y),It indicatesMiddle coordinate position is the pixel of the pixel of (x, y)
Value.
Step 1_2: building convolutional neural networks: as shown in Figure 1, the convolutional neural networks include input layer, hidden layer, output
Layer, input layer include RGB figure input layer and depth map input layer, and hidden layer includes coding framework and decoding frame, coding framework by
RGB feature extraction module, depth characteristic extraction module and Fusion Features module three parts composition, RGB feature extraction module is by the 1st
A to the 4th neural network block, the 1st to the 3rd down-sampling block composition, depth characteristic extraction module is by the 5th to the 8th mind
Through network block, the 4th to the 6th down-sampling block composition, Fusion Features module is by the 9th to the 15th neural network block, the 1st
To the 4th maximum pond layer composition, frame is decoded by the 16th to the 19th neural network block, the 1st to the 4th up-sampling layer
Composition;Output layer is made of the first convolutional layer, first normalization layer and the first active coating, the convolution kernel size of the first convolutional layer
It is 1, is filled with 1 for 3 × 3, step size 1, convolution kernel number, the active mode of the first active coating is " Sigmoid ".
For RGB figure input layer, input terminal receives width training left view point image, and output end output is trained left
Visual point image is to hidden layer;Wherein, it is desirable that training is W with the width of left view point image and height is H.
For depth map input layer, input terminal receives the received training left view point diagram of input terminal of RGB figure input layer
As corresponding trained depth image, output end exports trained depth image to hidden layer;Wherein, trained depth image
Width is W and height is H.
For RGB feature extraction module, the output end that the input terminal of the 1st neural network block receives RGB figure input layer is defeated
The left view point image of training out, the output end of the 1st neural network block export the characteristic pattern that 64 breadth degree are W and height is H,
The set that all characteristic patterns of output are constituted is denoted as P1;The input terminal of 1st down-sampling block receives P1In all characteristic patterns,
The output end of 1st down-sampling block exports 64 breadth degreeAnd height isCharacteristic pattern, by all characteristic pattern structures of output
At set be denoted as X1;The input terminal of 2nd neural network block receives X1In all characteristic patterns, the 2nd neural network block it is defeated
Outlet exports 128 breadth degreeAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P2;
The input terminal of 2nd down-sampling block receives P2In all characteristic patterns, the output end of the 2nd down-sampling block exports 128 breadth degree
ForAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as X2;3rd neural network block
Input terminal receives X2In all characteristic patterns, the output end of the 3rd neural network block exports 256 breadth degree and isAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P3;The input terminal of 3rd down-sampling block receives P3In
All characteristic patterns, the output end of the 3rd down-sampling block export 256 breadth degree and areAnd height isCharacteristic pattern, by output
The set that all characteristic patterns are constituted is denoted as X3;The input terminal of 4th neural network block receives X3In all characteristic patterns, the 4th mind
Output end through network block exports 512 breadth degreeAnd height isCharacteristic pattern, all characteristic patterns of output are constituted
Set is denoted as P4。
For depth characteristic extraction module, the input terminal of the 5th neural network block receives the output end of depth map input layer
The training depth image of output, the output end of the 5th neural network block export the characteristic pattern that 64 breadth degree are W and height is H,
The set that all characteristic patterns of output are constituted is denoted as P5;The input terminal of 4th down-sampling block receives P5In all characteristic patterns,
The output end of 4th down-sampling block exports 64 breadth degreeAnd height isCharacteristic pattern, by all characteristic pattern structures of output
At set be denoted as X4;The input terminal of 6th neural network block receives X4In all characteristic patterns, the 6th neural network block it is defeated
Outlet exports 128 breadth degreeAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P6;
The input terminal of 5th down-sampling block receives P6In all characteristic patterns, the output end of the 5th down-sampling block exports 128 breadth degree
ForAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as X5;7th neural network block
Input terminal receives X5In all characteristic patterns, the output end of the 7th neural network block exports 256 breadth degree and isAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P7;The input terminal of 6th down-sampling block receives P7In
All characteristic patterns, the output end of the 6th down-sampling block export 256 breadth degree and areAnd height isCharacteristic pattern, by output
The set that all characteristic patterns are constituted is denoted as X6;The input terminal of 8th neural network block receives X6In all characteristic patterns, the 8th mind
Output end through network block exports 512 breadth degreeAnd height isCharacteristic pattern, all characteristic patterns of output are constituted
Set is denoted as P8。
For Fusion Features module, the input terminal of the 9th neural network block receives the output end output of RGB figure input layer
Training left view point image, the output end of the 9th neural network block export the characteristic pattern that 3 breadth degree are W and height is H, will be defeated
The set that all characteristic patterns out are constituted is denoted as P9;The input terminal of 10th neural network block receives the output of depth map input layer
The training depth image of output is held, the output end of the 10th neural network block exports the feature that 3 breadth degree are W and height is H
The set that all characteristic patterns of output are constituted is denoted as P by figure10;To P9In all characteristic patterns and P10In all characteristic patterns into
Row Element-wise Summation operation, it is W and height that 3 breadth degree are exported after Element-wise Summation operation
For the characteristic pattern of H, the set that all characteristic patterns of output are constituted is denoted as E1;The input terminal of 11st neural network block receives E1
In all characteristic patterns, the output end of the 11st neural network block exports the characteristic pattern that 64 breadth degree are W and height is H, will be defeated
The set that all characteristic patterns out are constituted is denoted as P11;To P1In all characteristic patterns, P5In all characteristic patterns and P11In institute
There is characteristic pattern to carry out Element-wise Summation operation, exports 64 breadth after Element-wise Summation operation
The set that all characteristic patterns of output are constituted is denoted as E by the characteristic pattern that degree is W and height is H2;1st maximum pond layer it is defeated
Enter end and receives E2In all characteristic patterns, the output end 64 breadth degree of output of the 1st maximum pond layer areAnd height is's
The set that all characteristic patterns of output are constituted is denoted as Z by characteristic pattern1;The input terminal of 12nd neural network block receives Z1In institute
There is characteristic pattern, the output end of the 12nd neural network block exports 128 breadth degree and isAnd height isCharacteristic pattern, will export
All characteristic patterns constitute set be denoted as P12;To P2In all characteristic patterns, P6In all characteristic patterns and P12In it is all
Characteristic pattern carries out Element-wise Summation operation, exports 128 breadth after Element-wise Summation operation
Degree isAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as E3;2nd maximum pond layer
Input terminal receive E3In all characteristic patterns, the output end 128 breadth degree of output of the 2nd maximum pond layer areAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as Z2;The input terminal of 13rd neural network block receives Z2
In all characteristic patterns, the output end of the 13rd neural network block exports 256 breadth degree and isAnd height isCharacteristic pattern,
The set that all characteristic patterns of output are constituted is denoted as P13;To P3In all characteristic patterns, P7In all characteristic patterns and P13In
All characteristic patterns carry out Element-wise Summation operation, Element-wise Summation operation after export
256 breadth degree areAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as E4;3rd maximum
The input terminal of pond layer receives E4In all characteristic patterns, the output end 256 breadth degree of output of the 3rd maximum pond layer areAnd
Highly it isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as Z3;The input terminal of 14th neural network block
Receive Z3In all characteristic patterns, the output end of the 14th neural network block exports 512 breadth degree and isAnd height isSpy
The set that all characteristic patterns of output are constituted is denoted as P by sign figure14;To P4In all characteristic patterns, P8In all characteristic patterns and
P14In all characteristic patterns carry out Element-wise Summation operation, Element-wise Summation operation after it is defeated
512 breadth degree are outAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as E5;4th most
The input terminal of great Chiization layer receives E5In all characteristic patterns, the output end 512 breadth degree of output of the 4th maximum pond layer areAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as Z4;15th neural network block it is defeated
Enter end and receives Z4In all characteristic patterns, the output end of the 15th neural network block exports 1024 breadth degree and isAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P15。
For decoding frame, the input terminal of the 1st up-sampling layer receives P15In all characteristic patterns, the 1st up-sampling layer
Output end export 1024 breadth degree beAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted remembers
For S1;The input terminal of 16th neural network block receives S1In all characteristic patterns, the output end output of the 16th neural network block
256 breadth degree areAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P16;On 2nd
The input terminal of sample level receives P16In all characteristic patterns, the 2nd up-sampling layer output end export 256 breadth degree beAnd
Highly it isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as S2;The input terminal of 17th neural network block
Receive S2In all characteristic patterns, the output end of the 17th neural network block exports 128 breadth degree and isAnd height isSpy
The set that all characteristic patterns of output are constituted is denoted as P by sign figure17;The input terminal of 3rd up-sampling layer receives P17In it is all
Characteristic pattern, the output end of the 3rd up-sampling layer export 128 breadth degree and areAnd height isCharacteristic pattern, by the institute of output
The set for having characteristic pattern to constitute is denoted as S3;The input terminal of 18th neural network block receives S3In all characteristic patterns, the 18th mind
Output end through network block exports 64 breadth degreeAnd height isCharacteristic pattern, all characteristic patterns of output are constituted
Set is denoted as P18;The input terminal of 4th up-sampling layer receives P18In all characteristic patterns, the 4th up-sampling layer output end it is defeated
The set that all characteristic patterns of output are constituted is denoted as S by the characteristic pattern that 64 breadth degree are W and height is H out4;19th nerve
The input terminal of network block receives S4In all characteristic patterns, the output end of the 19th neural network block export 64 breadth degree be W and
Height is the characteristic pattern of H, and the set that all characteristic patterns of output are constituted is denoted as P19。
For output layer, the input terminal of the first convolutional layer receives P19In all characteristic patterns, the output end of the first convolutional layer
Export the characteristic pattern that a breadth degree is W and height is H;The input terminal of first normalization layer receives the output end of the first convolutional layer
The characteristic pattern of output;The input terminal of first active coating receives the characteristic pattern of the output end output of first normalization layer;First swashs
The output end of layer living exports the Saliency maps picture that the corresponding stereo-picture of left view point image is used in width training;Wherein, Saliency maps
The width of picture is W and height is H.
Step 1_3: using the left view point image of the original stereo-picture of every in training set as training left view point diagram
Picture, and using the depth image of the original stereo-picture of every in training set as trained depth image, it is input to convolution mind
It is trained in network, the Saliency maps picture of every original stereo-picture in training set is obtained, by { In(x, y) } it is aobvious
Work property image is denoted asWherein,It indicatesMiddle coordinate position is the pixel of (x, y)
Pixel value.
Step 1_4: the Saliency maps picture and true human eye for calculating every original stereo-picture in training set watch figure attentively
Loss function value as between, willWithBetween loss function value be denoted asUsing mean square error loss function
It obtains.
Step 1_5: repeating step 1_3 and step 1_4 is V times total, obtains convolutional neural networks training pattern, and there are
To N × V loss function value;Then the smallest loss function value of value is found out from N × V loss function value;Then will be worth most
The small corresponding weighted vector of loss function value and bias term swears the best initial weights that should be used as convolutional neural networks training pattern
Amount and optimal bias term, correspondence are denoted as WbestAnd bbest;Wherein, V > 1, such as takes V=50.
The specific steps of the test phase process are as follows:
Step 2_1: it enablesIndicate the stereo-picture that width to be tested is W' and height is H', it willLeft view point image and depth image correspondence be denoted asWithWherein, 1≤x'≤
W', 1≤y'≤H',It indicatesMiddle coordinate position is the pixel value of the pixel of (x', y'),It indicatesMiddle coordinate position is the pixel value of the pixel of (x', y'),It indicatesMiddle coordinate position is the pixel value of the pixel of (x', y').
Step 2_2: willWithIt is input in convolutional neural networks training pattern, and utilizes
WbestAnd bbestIt is predicted, is obtainedConspicuousness forecast image, be denoted asWherein,It indicatesMiddle coordinate position is the pixel value of the pixel of (x', y').
In this particular embodiment, in step 1_2, the structure of the 1st to the 8th neural network block is identical, by successively
The empty convolutional layer of first be arranged, second batch normalization layer, the second active coating, the first residual block, the second empty convolutional layer, third
It criticizes normalization layer to constitute, the input terminal of the first empty convolutional layer is the input terminal of the neural network block where it, second batch standard
The input terminal for changing layer receives all characteristic patterns that the output end of the first empty convolutional layer exports, and the input terminal of the second active coating receives
All characteristic patterns of the output end output of second batch normalization layer, the input terminal of the first residual block receive the output of the second active coating
All characteristic patterns of output are held, the input terminal of the second empty convolutional layer receives all features of the output end output of the first residual block
Figure, the input terminal of third batch normalization layer receive all characteristic patterns of the output end output of the second empty convolutional layer, third batch mark
The output end of standardization layer is the output end of the neural network block where it;Wherein, the 1st and the 5th neural network block respectively in
The first empty convolutional layer and the second empty convolutional layer convolution kernel number be the 64, the 2nd and the 6th neural network block respectively in
The first empty convolutional layer and the second empty convolutional layer convolution kernel number be the 128, the 3rd and the 7th neural network block respectively
In the first empty convolutional layer and the convolution kernel number of the second empty convolutional layer be that the 256, the 4th and the 8th neural network block are each
The convolution kernel number of the first empty convolutional layer and the second empty convolutional layer in is the 512, the 1st to the 8th neural network block
It is that 1, cavity is that the convolution kernel size of the first empty convolutional layer and the second empty convolutional layer in respectively, which is 3 × 3, stride,
2, filling is 2, the 1st to the 8th neural network block respectively in the active mode of the second active coating be " ReLU ".
9th identical with the 10th structure of neural network block, is marked by the second convolutional layer set gradually and the 4th batch
Standardization layer is constituted, and the input terminal of the second convolutional layer is the input terminal of the neural network block where it, the 4th batch of normalization layer it is defeated
Enter all characteristic patterns that end receives the output end output of the second convolutional layer, the output end of the 4th batch of normalization layer is the mind where it
Output end through network block;Wherein, the 9th and the 10th neural network block respectively in the second convolutional layer convolution kernel number it is equal
For 3, convolution kernel size be 7 × 7, stride be 1, filling is 3.
11st identical with the 12nd structure of neural network block, by third convolutional layer, the 5th batch of mark set gradually
Standardization layer, third active coating, Volume Four lamination, the 6th batch of normalization layer are constituted, and the input terminal of third convolutional layer is where it
The input terminal of neural network block, the input terminal of the 5th batch of normalization layer receive all features of the output end output of third convolutional layer
Figure, the input terminal of third active coating receive all characteristic patterns of the output end output of the 5th batch of normalization layer, third active coating
Input terminal receives all characteristic patterns of the output end output of the 5th batch of normalization layer, and the input terminal of Volume Four lamination receives third and swashs
All characteristic patterns of the output end output of layer living, the input terminal of the 6th batch of normalization layer receive the output end output of Volume Four lamination
All characteristic patterns, the output end of the 6th batch of normalization layer is the output end of the neural network block where it;Wherein, the 11st mind
Convolution kernel number through third convolutional layer and Volume Four lamination in network block is the third volume in the 64, the 12nd neural network block
The convolution kernel number of lamination and Volume Four lamination is 128, the 11st and the 12nd neural network block respectively in third convolutional layer
Convolution kernel size with Volume Four lamination be 3 × 3, stride be 1, filling is 1;11st and the 12nd neural network block
The active mode of third active coating in respectively is " ReLU ".
The structure of 13rd to the 19th neural network block is identical, by the 5th convolutional layer, the 7th batch of mark set gradually
Standardization layer, the 4th active coating, the 6th convolutional layer, the 8th batch of normalization layer, the 5th active coating, the 7th convolutional layer, the 9th batch of standard
Change layer to constitute, the input terminal of the 5th convolutional layer is the input terminal of the neural network block where it, the input of the 7th batch of normalization layer
End receives all characteristic patterns of the output end output of the 5th convolutional layer, and the input terminal of the 4th active coating receives the 7th batch of normalization layer
Output end output all characteristic patterns, the input terminal of the 6th convolutional layer receives all spies of the output end output of the 4th active coating
Sign figure, the input terminal of the 8th batch of normalization layer receive all characteristic patterns of the output end output of the 6th convolutional layer, the 5th active coating
Input terminal receive the 8th batch of normalization layer output end output all characteristic patterns, the input terminal of the 7th convolutional layer receives the 5th
All characteristic patterns of the output end output of active coating, the output end that the input terminal of the 9th batch of normalization layer receives the 7th convolutional layer are defeated
All characteristic patterns out, the output end of the 9th batch of normalization layer are the output end of the neural network block where it;Wherein, the 13rd
The convolution kernel number of the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer in neural network block is the 256, the 14th nerve net
The convolution kernel number of the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer in network block is in the 512, the 15th neural network block
The 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer convolution kernel number be in the 1024, the 16th neural network block the
The convolution kernel number of five convolutional layers, the 6th convolutional layer and the 7th convolutional layer corresponds to the 512,512,256, the 17th neural network block
In the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer convolution kernel number correspond to the 256,256,128, the 18th nerve
The convolution kernel number of the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer in network block corresponds to the 128,128,64, the 19th
The convolution kernel number of the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer in a neural network block is the 64, the 13rd to the
19 neural network blocks respectively in the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer convolution kernel size be 3 × 3,
Stride is 1, filling is 1, the 13rd to the 19th neural network block respectively in the 4th active coating and the 5th active coating
Active mode is " ReLU ".
In this particular embodiment, in step 1_2, the structure of the 1st to the 6th down-sampling block is identical, residual by second
Poor block is constituted, and the input terminal of the second residual block is the input terminal of the down-sampling block where it, and the output end of the second residual block is it
The output end of the down-sampling block at place.
In this particular embodiment, the first residual block is identical with the structure of second residual block comprising 3 convolution
Layer, 3 batches of normalization layers and 3 active coatings, the input terminal of residual block of the input terminal of the 1st convolutional layer where it, the 1st
The input terminal for criticizing normalization layer receives all characteristic patterns that the output end of the 1st convolutional layer exports, the input terminal of the 1st active coating
All characteristic patterns of the output end output of the 1st batch of normalization layer are received, the input terminal of the 2nd convolutional layer receives the 1st activation
All characteristic patterns of the output end output of layer, the input terminal of the 2nd batch of normalization layer receive the output end output of the 2nd convolutional layer
All characteristic patterns, the input terminal of the 2nd active coating receives all characteristic patterns of the output end output of the 2nd batch normalization layer,
The input terminal of 3rd convolutional layer receives all characteristic patterns of the output end output of the 2nd active coating, and the 3rd is criticized normalization layer
Input terminal receives all characteristic patterns of the output end output of the 3rd convolutional layer, the received all spies of the input terminal of the 1st convolutional layer
Sign figure is added with all characteristic patterns of the output end of the 3rd batch of normalization layer output, is swashed using the 3rd after the 3rd active coating
All characteristic patterns of the output end output of residual block of all characteristic patterns of the output end output of layer living as where;Wherein, the 1st
A and the 5th neural network block respectively in the first residual block in the convolution kernel number of each convolutional layer be the 64, the 2nd and the
6 neural network blocks respectively in the first residual block in each convolutional layer convolution kernel number be the 128, the 3rd and the 7th mind
Through network block respectively in the first residual block in each convolutional layer convolution kernel number be the 256, the 4th and the 8th nerve net
Network block respectively in the first residual block in each convolutional layer convolution kernel number be the 512, the 1st to the 8th neural network block
It is 1 that the convolution kernel size of the 1st convolutional layer and the 3rd convolutional layer in the first residual block in respectively, which is 1 × 1, stride,
1st to the 8th neural network block respectively in the first residual block in the 2nd convolutional layer convolution kernel size be 3 × 3,
Stride is 1, filling is 1, the 1st and the 4th down-sampling block respectively in the second residual block in each convolutional layer volume
Product core number be the 64, the 2nd and the 5th down-sampling block respectively in the second residual block in each convolutional layer convolution kernel number
Be the 128, the 3rd and the 6th down-sampling block respectively in the second residual block in each convolutional layer convolution kernel number be 256,
1st to the 6th down-sampling block respectively in the second residual block in the 1st convolutional layer and the 3rd convolutional layer convolution kernel it is big
Small be 1 × 1, stride be 1, the 1st to the 6th down-sampling block respectively in the second residual block in the 2nd convolutional layer
Convolution kernel size be 3 × 3, stride be 2, filling be the active modes of 1,3 active coatings is " ReLU ".
In this particular embodiment, in step 1_2, the size of the pond window of the 1st to the 4th maximum pond layer is
2 × 2, stride is 2.
In in this particular embodiment, in step 1_2, the sampling configuration of the 1st to the 4th up-sampling layer is bilinearity
Interpolation, scale factor are 2.
In order to verify the feasibility and validity of the method for the present invention, tested.
Here, using TaiWan, China university of communications provide three-dimensional tracing of human eye database (NCTU-3DFixation) come
Analyze the Stability and veracity of the method for the present invention.Here, objective parameter is commonly used using 4 of the assessment significant extracting method of vision
As evaluation index, i.e. linearly dependent coefficient (Linear Correlation Coefficient, CC), Kullback-
Leibler divergence coefficient (Kullback-Leibler Divergence, KLD), AUC parameter (the Area Under the
Receiver operating characteristics Curve, AUC), normalized scans path conspicuousness (Normalized
Scanpath Saliency, NSS).
The every width obtained in the three-dimensional tracing of human eye database that TaiWan, China university of communications provides using the method for the present invention is vertical
The conspicuousness forecast image of body image, and the subjective vision notable figure with every width stereo-picture in three-dimensional tracing of human eye database
I.e. true human eye gazing at images (existing in three-dimensional tracing of human eye database) is compared, and CC, AUC and NSS value is higher, KLD value
The lower conspicuousness forecast image for illustrating the method for the present invention acquisition and the consistency of subjective vision notable figure are better.The reflection present invention
CC, KLD, AUC and NSS index of correlation of the significant extraction performance of method are as listed in table 1.
The Stability and veracity of conspicuousness forecast image and subjective vision notable figure that table 1 is obtained using the method for the present invention
Performance indicator | CC | KLD | AUC(Borji) | NSS |
Performance index value | 0.7583 | 0.4868 | 0.8789 | 2.0692 |
The data listed by the table 1 are it is found that the conspicuousness forecast image and subjective vision notable figure obtained by the method for the present invention
Stability and veracity be well, show that the result of objective testing result and human eye subjective perception is more consistent, it is sufficient to say
The feasibility and validity of bright the method for the present invention.
Claims (6)
1. a kind of stereo-picture vision significance detection method based on convolutional neural networks, it is characterised in that including the training stage
With two processes of test phase;
The specific steps of the training stage process are as follows:
Step 1_1: the original stereo-picture that N breadth degree is W and height is H is chosen;Then all original of selection is stood
Body image and the respective left view point image of all original stereo-pictures, depth image and true human eye gazing at images composing training
Collection, is denoted as { I for n-th original stereo-picture in training setn(x, y) }, by { In(x, y) } left view point image, depth map
Picture and true human eye gazing at images correspondence are denoted as{Dn(x,y)}、Wherein, N is positive integer, N
>=300, W and H can be divided exactly by 2, and n is positive integer, and the initial value of n is 1,1≤n≤N, 1≤x≤W, 1≤y≤H, In(x,
Y) { I is indicatedn(x, y) } in coordinate position be (x, y) pixel pixel value,It indicatesMiddle seat
Mark is set to the pixel value of the pixel of (x, y), Dn(x, y) indicates { Dn(x, y) } in coordinate position be (x, y) pixel
Pixel value,It indicatesMiddle coordinate position is the pixel value of the pixel of (x, y);
Step 1_2: building convolutional neural networks: the convolutional neural networks include input layer, hidden layer, output layer, and input layer includes
RGB figure input layer and depth map input layer, hidden layer include coding framework and decoding frame, and coding framework extracts mould by RGB feature
Block, depth characteristic extraction module and Fusion Features module three parts composition, RGB feature extraction module is by the 1st to the 4th nerve
Network block, the 1st to the 3rd down-sampling block composition, depth characteristic extraction module is by the 5th to the 8th neural network block, the 4th
A to the 6th down-sampling block composition, Fusion Features module is by the 9th to the 15th neural network block, the 1st to the 4th maximum
Pond layer composition, decoding frame are made of the 16th to the 19th neural network block, the 1st to the 4th up-sampling layer;Output layer
It is made of the first convolutional layer, first normalization layer and the first active coating, the convolution kernel size of the first convolutional layer is 3 × 3, stride
Size is 1, convolution kernel number is 1, is filled with 1, and the active mode of the first active coating is " Sigmoid ";
For RGB figure input layer, input terminal receives width training left view point image, and output end exports trained left view point
Image is to hidden layer;Wherein, it is desirable that training is W with the width of left view point image and height is H;
For depth map input layer, input terminal receives the received training left view point image pair of input terminal of RGB figure input layer
The training depth image answered, output end export trained depth image to hidden layer;Wherein, the width of trained depth image
For W and height is H;
For RGB feature extraction module, the input terminal of the 1st neural network block receives the output end output of RGB figure input layer
Training left view point image, the output end of the 1st neural network block export the characteristic pattern that 64 breadth degree are W and height is H, will be defeated
The set that all characteristic patterns out are constituted is denoted as P1;The input terminal of 1st down-sampling block receives P1In all characteristic patterns, the 1st
The output end of down-sampling block exports 64 breadth degreeAnd height isCharacteristic pattern, all characteristic patterns of output are constituted
Set is denoted as X1;The input terminal of 2nd neural network block receives X1In all characteristic patterns, the output end of the 2nd neural network block
Exporting 128 breadth degree isAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P2;2nd
The input terminal of down-sampling block receives P2In all characteristic patterns, the output end of the 2nd down-sampling block exports 128 breadth degree and isAnd
Highly it isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as X2;The input termination of 3rd neural network block
Receive X2In all characteristic patterns, the output end of the 3rd neural network block exports 256 breadth degree and isAnd height isFeature
The set that all characteristic patterns of output are constituted is denoted as P by figure3;The input terminal of 3rd down-sampling block receives P3In all features
Figure, the output end of the 3rd down-sampling block export 256 breadth degree and areAnd height isCharacteristic pattern, by all features of output
The set that figure is constituted is denoted as X3;The input terminal of 4th neural network block receives X3In all characteristic patterns, the 4th neural network block
Output end export 512 breadth degree beAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as
P4;
For depth characteristic extraction module, the input terminal of the 5th neural network block receives the output end output of depth map input layer
Training depth image, the output end of the 5th neural network block exports the characteristic pattern that 64 breadth degree are W and height is H, will be defeated
The set that all characteristic patterns out are constituted is denoted as P5;The input terminal of 4th down-sampling block receives P5In all characteristic patterns, the 4th
The output end of down-sampling block exports 64 breadth degreeAnd height isCharacteristic pattern, all characteristic patterns of output are constituted
Set is denoted as X4;The input terminal of 6th neural network block receives X4In all characteristic patterns, the output end of the 6th neural network block
Exporting 128 breadth degree isAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P6;5th
The input terminal of down-sampling block receives P6In all characteristic patterns, the output end of the 5th down-sampling block exports 128 breadth degree and isAnd
Highly it isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as X5;The input termination of 7th neural network block
Receive X5In all characteristic patterns, the output end of the 7th neural network block exports 256 breadth degree and isAnd height isFeature
The set that all characteristic patterns of output are constituted is denoted as P by figure7;The input terminal of 6th down-sampling block receives P7In all features
Figure, the output end of the 6th down-sampling block export 256 breadth degree and areAnd height isCharacteristic pattern, by all features of output
The set that figure is constituted is denoted as X6;The input terminal of 8th neural network block receives X6In all characteristic patterns, the 8th neural network block
Output end export 512 breadth degree beAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as
P8;
For Fusion Features module, the input terminal of the 9th neural network block receives the training of the output end output of RGB figure input layer
With left view point image, the output end of the 9th neural network block exports the characteristic pattern that 3 breadth degree are W and height is H, by output
The set that all characteristic patterns are constituted is denoted as P9;The output end that the input terminal of 10th neural network block receives depth map input layer is defeated
Training depth image out, the output end of the 10th neural network block export the characteristic pattern that 3 breadth degree are W and height is H, will
The set that all characteristic patterns of output are constituted is denoted as P10;To P9In all characteristic patterns and P10In all characteristic patterns carry out
Element-wise Summation operation exports 3 breadth degree and is W and is highly after Element-wise Summation operation
The set that all characteristic patterns of output are constituted is denoted as E by the characteristic pattern of H1;The input terminal of 11st neural network block receives E1In
All characteristic patterns, the output end of the 11st neural network block exports the characteristic pattern that 64 breadth degree are W and height is H, will export
All characteristic patterns constitute set be denoted as P11;To P1In all characteristic patterns, P5In all characteristic patterns and P11In it is all
Characteristic pattern carries out Element-wise Summation operation, exports 64 breadth degree after Element-wise Summation operation
For W and characteristic pattern that height is H, the set that all characteristic patterns of output are constituted is denoted as E2;The input of 1st maximum pond layer
End receives E2In all characteristic patterns, the output end 64 breadth degree of output of the 1st maximum pond layer areAnd height isSpy
The set that all characteristic patterns of output are constituted is denoted as Z by sign figure1;The input terminal of 12nd neural network block receives Z1In it is all
Characteristic pattern, the output end of the 12nd neural network block export 128 breadth degree and areAnd height isCharacteristic pattern, by output
The set that all characteristic patterns are constituted is denoted as P12;To P2In all characteristic patterns, P6In all characteristic patterns and P12In all spies
Sign figure carries out Element-wise Summation operation, exports 128 breadth degree after Element-wise Summation operation
ForAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as E3;2nd maximum pond layer
Input terminal receives E3In all characteristic patterns, the output end 128 breadth degree of output of the 2nd maximum pond layer areAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as Z2;The input terminal of 13rd neural network block receives Z2
In all characteristic patterns, the output end of the 13rd neural network block exports 256 breadth degree and isAnd height isCharacteristic pattern,
The set that all characteristic patterns of output are constituted is denoted as P13;To P3In all characteristic patterns, P7In all characteristic patterns and P13In
All characteristic patterns carry out Element-wise Summation operation, Element-wise Summation operation after export
256 breadth degree areAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as E4;3rd maximum
The input terminal of pond layer receives E4In all characteristic patterns, the output end 256 breadth degree of output of the 3rd maximum pond layer areAnd
Highly it isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as Z3;The input terminal of 14th neural network block
Receive Z3In all characteristic patterns, the output end of the 14th neural network block exports 512 breadth degree and isAnd height isSpy
The set that all characteristic patterns of output are constituted is denoted as P by sign figure14;To P4In all characteristic patterns, P8In all characteristic patterns and
P14In all characteristic patterns carry out Element-wise Summation operation, Element-wise Summation operation after it is defeated
512 breadth degree are outAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as E5;4th most
The input terminal of great Chiization layer receives E5In all characteristic patterns, the output end 512 breadth degree of output of the 4th maximum pond layer are
And height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as Z4;The input of 15th neural network block
End receives Z4In all characteristic patterns, the output end of the 15th neural network block exports 1024 breadth degree and isAnd height is
Characteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P15;
For decoding frame, the input terminal of the 1st up-sampling layer receives P15In all characteristic patterns, the 1st up-sampling layer it is defeated
Outlet exports 1024 breadth degreeAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as S1;
The input terminal of 16th neural network block receives S1In all characteristic patterns, the output end output 256 of the 16th neural network block
Breadth degree isAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as P16;2nd up-sampling
The input terminal of layer receives P16In all characteristic patterns, the 2nd up-sampling layer output end export 256 breadth degree beAnd height
ForCharacteristic pattern, the set that all characteristic patterns of output are constituted is denoted as S2;The input terminal of 17th neural network block receives
S2In all characteristic patterns, the output end of the 17th neural network block exports 128 breadth degree and isAnd height isFeature
The set that all characteristic patterns of output are constituted is denoted as P by figure17;The input terminal of 3rd up-sampling layer receives P17In all spies
Sign figure, the output end of the 3rd up-sampling layer export 128 breadth degree and areAnd height isCharacteristic pattern, by all spies of output
The set that sign figure is constituted is denoted as S3;The input terminal of 18th neural network block receives S3In all characteristic patterns, the 18th nerve net
The output end of network block exports 64 breadth degreeAnd height isCharacteristic pattern, the set that all characteristic patterns of output are constituted
It is denoted as P18;The input terminal of 4th up-sampling layer receives P18In all characteristic patterns, the 4th up-sampling layer output end output 64
The set that all characteristic patterns of output are constituted is denoted as S by the characteristic pattern that breadth degree is W and height is H4;19th neural network
The input terminal of block receives S4In all characteristic patterns, it is W and height that the output end of the 19th neural network block, which exports 64 breadth degree,
For the characteristic pattern of H, the set that all characteristic patterns of output are constituted is denoted as P19;
For output layer, the input terminal of the first convolutional layer receives P19In all characteristic patterns, the first convolutional layer output end output
The characteristic pattern that one breadth degree is W and height is H;The input terminal of first normalization layer receives the output end output of the first convolutional layer
Characteristic pattern;The input terminal of first active coating receives the characteristic pattern of the output end output of first normalization layer;First active coating
Output end export a width training use the corresponding stereo-picture of left view point image Saliency maps picture;Wherein, Saliency maps as
Width is W and height is H;
Step 1_3: using the left view point image of the original stereo-picture of every in training set as training left view point image, and
Using the depth image of the original stereo-picture of every in training set as trained depth image, it is input to convolutional neural networks
In be trained, the Saliency maps picture of every original stereo-picture in training set is obtained, by { In(x, y) } Saliency maps
As being denoted asWherein,It indicatesMiddle coordinate position is the pixel of the pixel of (x, y)
Value;
Step 1_4: calculate training set in every original stereo-picture Saliency maps picture and true human eye gazing at images it
Between loss function value, willWithBetween loss function value be denoted asUsing mean square error loss function
It obtains;
Step 1_5: repeating step 1_3 and step 1_4 is V times total, obtains convolutional neural networks training pattern, and N is obtained
× V loss function value;Then the smallest loss function value of value is found out from N × V loss function value;Then will be worth the smallest
The corresponding weighted vector of loss function value and bias term are to the best initial weights vector sum that should be used as convolutional neural networks training pattern
Optimal bias term, correspondence are denoted as WbestAnd bbest;Wherein, V > 1;
The specific steps of the test phase process are as follows:
Step 2_1: it enablesIndicate the stereo-picture that width to be tested is W' and height is H', it will's
Left view point image and depth image correspondence are denoted asWithWherein, 1≤x'≤W', 1≤y'≤H',It indicatesMiddle coordinate position is the pixel value of the pixel of (x', y'),It indicatesMiddle coordinate position is the pixel value of the pixel of (x', y'),It indicatesMiddle coordinate bit
It is set to the pixel value of the pixel of (x', y');
Step 2_2: willWithIt is input in convolutional neural networks training pattern, and utilizes WbestWith
bbestIt is predicted, is obtainedConspicuousness forecast image, be denoted asWherein,It indicatesMiddle coordinate position is the pixel value of the pixel of (x', y').
2. a kind of stereo-picture vision significance detection method based on convolutional neural networks according to claim 1,
In step 1_2 described in being characterized in that, the structure of the 1st to the 8th neural network block is identical, empty by set gradually first
Hole convolutional layer, second batch normalization layer, the second active coating, the first residual block, the second empty convolutional layer, third batch normalization layer structure
At the input terminal of the first empty convolutional layer is the input terminal of the neural network block where it, the input terminal of second batch normalization layer
All characteristic patterns of the output end output of the first empty convolutional layer are received, the input terminal of the second active coating receives second batch standardization
All characteristic patterns of the output end output of layer, the input terminal of the first residual block receive all of the output end output of the second active coating
Characteristic pattern, the input terminal of the second empty convolutional layer receive all characteristic patterns of the output end output of the first residual block, third batch mark
The input terminal of standardization layer receives all characteristic patterns of the output end output of the second empty convolutional layer, the output of third batch normalization layer
The output end of neural network block of the end where it;Wherein, the 1st and the 5th neural network block respectively in the first cavity volume
The convolution kernel number of lamination and the second empty convolutional layer be the 64, the 2nd and the 6th neural network block respectively in the first cavity roll up
The convolution kernel number of lamination and the second empty convolutional layer be the 128, the 3rd and the 7th neural network block respectively in it is first empty
The convolution kernel number of convolutional layer and the second empty convolutional layer be the 256, the 4th and the 8th neural network block respectively in the first sky
The convolution kernel number of hole convolutional layer and the second empty convolutional layer be the 512, the 1st to the 8th neural network block respectively in first
It is 1, cavity be 2, filling is 2 that the convolution kernel size of empty convolutional layer and the second empty convolutional layer, which is 3 × 3, stride,
1st to the 8th neural network block respectively in the active mode of the second active coating be " ReLU ";
9th identical with the 10th structure of neural network block, is standardized by the second convolutional layer set gradually and the 4th batch
Layer is constituted, and the input terminal of the second convolutional layer is the input terminal of the neural network block where it, the input terminal of the 4th batch of normalization layer
All characteristic patterns of the output end output of the second convolutional layer are received, the output end of the 4th batch of normalization layer is the nerve net where it
The output end of network block;Wherein, the 9th and the 10th neural network block respectively in the second convolutional layer convolution kernel number be 3,
Convolution kernel size be 7 × 7, stride be 1, filling is 3;
11st is identical with the 12nd structure of neural network block, by set gradually third convolutional layer, the 5th batch standardization
Layer, third active coating, Volume Four lamination, the 6th batch of normalization layer are constituted, and the input terminal of third convolutional layer is the nerve where it
The input terminal of network block, the input terminal of the 5th batch of normalization layer receive all characteristic patterns of the output end output of third convolutional layer,
The input terminal of third active coating receives all characteristic patterns of the output end output of the 5th batch of normalization layer, the input of third active coating
End receives all characteristic patterns of the output end output of the 5th batch of normalization layer, and the input terminal of Volume Four lamination receives third active coating
Output end output all characteristic patterns, the input terminal of the 6th batch of normalization layer receives the institute of the output end output of Volume Four lamination
There is characteristic pattern, the output end of the 6th batch of normalization layer is the output end of the neural network block where it;Wherein, the 11st nerve net
The convolution kernel number of third convolutional layer and Volume Four lamination in network block is the third convolutional layer in the 64, the 12nd neural network block
With the convolution kernel number of Volume Four lamination be the 128, the 11st and the 12nd neural network block respectively in third convolutional layer and the
The convolution kernel size of four convolutional layers be 3 × 3, stride be 1, filling is 1;11st and the 12nd neural network block are respectively
In third active coating active mode be " ReLU ";
The structure of 13rd to the 19th neural network block is identical, by set gradually the 5th convolutional layer, the 7th batch standardization
Layer, the 4th active coating, the 6th convolutional layer, the 8th batch of normalization layer, the 5th active coating, the 7th convolutional layer, the 9th batch of normalization layer
It constitutes, the input terminal of the 5th convolutional layer is the input terminal of the neural network block where it, the input termination of the 7th batch of normalization layer
All characteristic patterns of the output end output of the 5th convolutional layer are received, the input terminal of the 4th active coating receives the defeated of the 7th batch of normalization layer
All characteristic patterns of outlet output, the input terminal of the 6th convolutional layer receive all features of the output end output of the 4th active coating
Figure, the input terminal of the 8th batch of normalization layer receive all characteristic patterns of the output end output of the 6th convolutional layer, the 5th active coating
Input terminal receives all characteristic patterns of the output end output of the 8th batch of normalization layer, and the input terminal of the 7th convolutional layer receives the 5th and swashs
All characteristic patterns of the output end output of layer living, the input terminal of the 9th batch of normalization layer receive the output end output of the 7th convolutional layer
All characteristic patterns, the output end of the 9th batch of normalization layer is the output end of the neural network block where it;Wherein, the 13rd mind
Convolution kernel number through the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer in network block is the 256, the 14th neural network
The convolution kernel number of the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer in block is in the 512, the 15th neural network block
The convolution kernel number of 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer is the 5 in the 1024, the 16th neural network block
The convolution kernel number of convolutional layer, the 6th convolutional layer and the 7th convolutional layer corresponds in the 512,512,256, the 17th neural network block
The 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer convolution kernel number correspond to the 256,256,128, the 18th nerve net
The convolution kernel number of the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer in network block corresponds to the 128,128,64, the 19th
The convolution kernel number of the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer in neural network block is the 64, the 13rd to the 19th
A neural network block respectively in the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer convolution kernel size be 3 × 3, step
Width is 1, filling is 1, the 13rd to the 19th neural network block respectively in the 4th active coating and the 5th active coating swash
Mode living is " ReLU ".
3. a kind of stereo-picture vision significance detection method based on convolutional neural networks according to claim 2,
In step 1_2 described in being characterized in that, the structure of the 1st to the 6th down-sampling block is identical, is made of the second residual block, the
The input terminal of two residual blocks is the input terminal of the down-sampling block where it, and the output end of the second residual block is the down-sampling where it
The output end of block.
4. a kind of stereo-picture vision significance detection method based on convolutional neural networks according to claim 3,
It is characterized in that first residual block is identical with the structure of second residual block comprising 3 convolutional layers, 3 batches of marks
Standardization layer and 3 active coatings, the input terminal of the 1st convolutional layer are the input terminal of the residual block where it, the 1st batch of normalization layer
Input terminal receive the 1st convolutional layer output end output all characteristic patterns, the input terminal of the 1st active coating receives the 1st
All characteristic patterns of the output end output of normalization layer are criticized, the input terminal of the 2nd convolutional layer receives the output end of the 1st active coating
All characteristic patterns of output, the input terminal of the 2nd batch of normalization layer receive all features of the output end output of the 2nd convolutional layer
Figure, the input terminal of the 2nd active coating receive all characteristic patterns of the output end output of the 2nd batch of normalization layer, the 3rd convolutional layer
Input terminal receive the 2nd active coating output end output all characteristic patterns, the input terminal of the 3rd batch normalization layer receives the
All characteristic patterns of the output end output of 3 convolutional layers, the received all characteristic patterns of the input terminal of the 1st convolutional layer and the 3rd
All characteristic patterns for criticizing the output end output of normalization layer are added, using the output end of the 3rd active coating after the 3rd active coating
All characteristic patterns of the output end output of residual block of all characteristic patterns of output as where;Wherein, the 1st and the 5th mind
Through network block respectively in the first residual block in each convolutional layer convolution kernel number be the 64, the 2nd and the 6th neural network
Block respectively in the first residual block in each convolutional layer convolution kernel number be the 128, the 3rd and the 7th neural network block it is each
The convolution kernel number of each convolutional layer in the first residual block in be the 256, the 4th and the 8th neural network block respectively in
The first residual block in each convolutional layer convolution kernel number be the 512, the 1st to the 8th neural network block respectively in the
It is 1 that the convolution kernel size of the 1st convolutional layer and the 3rd convolutional layer in one residual block, which is 1 × 1, stride, and the 1st to the 8th
A neural network block respectively in the first residual block in the 2nd convolutional layer convolution kernel size be 3 × 3, stride be 1,
Filling is 1, the 1st and the 4th down-sampling block respectively in the second residual block in the convolution kernel number of each convolutional layer be
64, the 2nd and the 5th down-sampling block respectively in the second residual block in the convolution kernel number of each convolutional layer be the 128, the 3rd
A and the 6th down-sampling block respectively in the second residual block in each convolutional layer convolution kernel number be the 256, the 1st to the 6th
A down-sampling block respectively in the second residual block in the 1st convolutional layer and the 3rd convolutional layer convolution kernel size be 1 × 1,
Stride is 1, the 1st to the 6th down-sampling block respectively in the second residual block in the 2nd convolutional layer convolution kernel size it is equal
It is 2 for 3 × 3, stride, filling be the active modes of 1,3 active coatings is " ReLU ".
5. a kind of stereo-picture vision significance based on convolutional neural networks according to any one of claim 1 to 4
Detection method, it is characterised in that in the step 1_2, the size of the pond window of the 1st to the 4th maximum pond layer is
2 × 2, stride is 2.
6. a kind of stereo-picture vision significance detection method based on convolutional neural networks according to claim 5,
In step 1_2 described in being characterized in that, the sampling configuration of the 1st to the 4th up-sampling layer is bilinear interpolation, scale factor
It is 2.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910327556.4A CN110175986B (en) | 2019-04-23 | 2019-04-23 | Stereo image visual saliency detection method based on convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910327556.4A CN110175986B (en) | 2019-04-23 | 2019-04-23 | Stereo image visual saliency detection method based on convolutional neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110175986A true CN110175986A (en) | 2019-08-27 |
CN110175986B CN110175986B (en) | 2021-01-08 |
Family
ID=67689881
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910327556.4A Active CN110175986B (en) | 2019-04-23 | 2019-04-23 | Stereo image visual saliency detection method based on convolutional neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110175986B (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110555434A (en) * | 2019-09-03 | 2019-12-10 | 浙江科技学院 | method for detecting visual saliency of three-dimensional image through local contrast and global guidance |
CN110782458A (en) * | 2019-10-23 | 2020-02-11 | 浙江科技学院 | Object image 3D semantic prediction segmentation method of asymmetric coding network |
CN111369506A (en) * | 2020-02-26 | 2020-07-03 | 四川大学 | Lens turbidity grading method based on eye B-ultrasonic image |
CN111582316A (en) * | 2020-04-10 | 2020-08-25 | 天津大学 | RGB-D significance target detection method |
CN111612832A (en) * | 2020-04-29 | 2020-09-01 | 杭州电子科技大学 | Method for improving depth estimation accuracy by utilizing multitask complementation |
CN112528899A (en) * | 2020-12-17 | 2021-03-19 | 南开大学 | Image salient object detection method and system based on implicit depth information recovery |
CN112528900A (en) * | 2020-12-17 | 2021-03-19 | 南开大学 | Image salient object detection method and system based on extreme down-sampling |
WO2021096806A1 (en) * | 2019-11-14 | 2021-05-20 | Zoox, Inc | Depth data model training with upsampling, losses, and loss balancing |
CN113192073A (en) * | 2021-04-06 | 2021-07-30 | 浙江科技学院 | Clothing semantic segmentation method based on cross fusion network |
US11157774B2 (en) * | 2019-11-14 | 2021-10-26 | Zoox, Inc. | Depth data model training with upsampling, losses, and loss balancing |
CN113592795A (en) * | 2021-07-19 | 2021-11-02 | 深圳大学 | Visual saliency detection method of stereoscopic image, thumbnail generation method and device |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106462771A (en) * | 2016-08-05 | 2017-02-22 | 深圳大学 | 3D image significance detection method |
CN106778687A (en) * | 2017-01-16 | 2017-05-31 | 大连理工大学 | Method for viewing points detecting based on local evaluation and global optimization |
US20170351941A1 (en) * | 2016-06-03 | 2017-12-07 | Miovision Technologies Incorporated | System and Method for Performing Saliency Detection Using Deep Active Contours |
CN109146944A (en) * | 2018-10-30 | 2019-01-04 | 浙江科技学院 | A kind of space or depth perception estimation method based on the revoluble long-pending neural network of depth |
CN109376611A (en) * | 2018-09-27 | 2019-02-22 | 方玉明 | A kind of saliency detection method based on 3D convolutional neural networks |
CN109598268A (en) * | 2018-11-23 | 2019-04-09 | 安徽大学 | A kind of RGB-D well-marked target detection method based on single flow depth degree network |
CN109635822A (en) * | 2018-12-07 | 2019-04-16 | 浙江科技学院 | The significant extracting method of stereo-picture vision based on deep learning coding and decoding network |
-
2019
- 2019-04-23 CN CN201910327556.4A patent/CN110175986B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170351941A1 (en) * | 2016-06-03 | 2017-12-07 | Miovision Technologies Incorporated | System and Method for Performing Saliency Detection Using Deep Active Contours |
CN106462771A (en) * | 2016-08-05 | 2017-02-22 | 深圳大学 | 3D image significance detection method |
CN106778687A (en) * | 2017-01-16 | 2017-05-31 | 大连理工大学 | Method for viewing points detecting based on local evaluation and global optimization |
CN109376611A (en) * | 2018-09-27 | 2019-02-22 | 方玉明 | A kind of saliency detection method based on 3D convolutional neural networks |
CN109146944A (en) * | 2018-10-30 | 2019-01-04 | 浙江科技学院 | A kind of space or depth perception estimation method based on the revoluble long-pending neural network of depth |
CN109598268A (en) * | 2018-11-23 | 2019-04-09 | 安徽大学 | A kind of RGB-D well-marked target detection method based on single flow depth degree network |
CN109635822A (en) * | 2018-12-07 | 2019-04-16 | 浙江科技学院 | The significant extracting method of stereo-picture vision based on deep learning coding and decoding network |
Non-Patent Citations (3)
Title |
---|
CHEN, HAO 等: "RGB-D Saliency Detection by Multi-stream Late Fusion Network", 《COMPUTER VISION SYSTEMS》 * |
XINGYU CAI 等: "Saliency detection for stereoscopic 3D images in the quaternion frequency domain", 《3D RESEARCH》 * |
李荣 等: "利用卷积神经网络的显著性区域预测方法", 《重庆邮电大学学报( 自然科学版)》 * |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110555434A (en) * | 2019-09-03 | 2019-12-10 | 浙江科技学院 | method for detecting visual saliency of three-dimensional image through local contrast and global guidance |
CN110555434B (en) * | 2019-09-03 | 2022-03-29 | 浙江科技学院 | Method for detecting visual saliency of three-dimensional image through local contrast and global guidance |
CN110782458A (en) * | 2019-10-23 | 2020-02-11 | 浙江科技学院 | Object image 3D semantic prediction segmentation method of asymmetric coding network |
CN110782458B (en) * | 2019-10-23 | 2022-05-31 | 浙江科技学院 | Object image 3D semantic prediction segmentation method of asymmetric coding network |
US11681046B2 (en) | 2019-11-14 | 2023-06-20 | Zoox, Inc. | Depth data model training with upsampling, losses and loss balancing |
WO2021096806A1 (en) * | 2019-11-14 | 2021-05-20 | Zoox, Inc | Depth data model training with upsampling, losses, and loss balancing |
US11157774B2 (en) * | 2019-11-14 | 2021-10-26 | Zoox, Inc. | Depth data model training with upsampling, losses, and loss balancing |
CN111369506A (en) * | 2020-02-26 | 2020-07-03 | 四川大学 | Lens turbidity grading method based on eye B-ultrasonic image |
CN111582316A (en) * | 2020-04-10 | 2020-08-25 | 天津大学 | RGB-D significance target detection method |
CN111582316B (en) * | 2020-04-10 | 2022-06-28 | 天津大学 | RGB-D significance target detection method |
CN111612832A (en) * | 2020-04-29 | 2020-09-01 | 杭州电子科技大学 | Method for improving depth estimation accuracy by utilizing multitask complementation |
CN111612832B (en) * | 2020-04-29 | 2023-04-18 | 杭州电子科技大学 | Method for improving depth estimation accuracy by utilizing multitask complementation |
CN112528899B (en) * | 2020-12-17 | 2022-04-12 | 南开大学 | Image salient object detection method and system based on implicit depth information recovery |
CN112528900B (en) * | 2020-12-17 | 2022-09-16 | 南开大学 | Image salient object detection method and system based on extreme down-sampling |
CN112528900A (en) * | 2020-12-17 | 2021-03-19 | 南开大学 | Image salient object detection method and system based on extreme down-sampling |
CN112528899A (en) * | 2020-12-17 | 2021-03-19 | 南开大学 | Image salient object detection method and system based on implicit depth information recovery |
CN113192073A (en) * | 2021-04-06 | 2021-07-30 | 浙江科技学院 | Clothing semantic segmentation method based on cross fusion network |
CN113592795A (en) * | 2021-07-19 | 2021-11-02 | 深圳大学 | Visual saliency detection method of stereoscopic image, thumbnail generation method and device |
CN113592795B (en) * | 2021-07-19 | 2024-04-12 | 深圳大学 | Visual saliency detection method for stereoscopic image, thumbnail generation method and device |
Also Published As
Publication number | Publication date |
---|---|
CN110175986B (en) | 2021-01-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110175986A (en) | A kind of stereo-picture vision significance detection method based on convolutional neural networks | |
CN110555434B (en) | Method for detecting visual saliency of three-dimensional image through local contrast and global guidance | |
CN107644415B (en) | A kind of text image method for evaluating quality and equipment | |
CN109410261B (en) | Monocular image depth estimation method based on pyramid pooling module | |
CN109558832A (en) | A kind of human body attitude detection method, device, equipment and storage medium | |
CN110210492A (en) | A kind of stereo-picture vision significance detection method based on deep learning | |
CN111275518A (en) | Video virtual fitting method and device based on mixed optical flow | |
CN110263813A (en) | A kind of conspicuousness detection method merged based on residual error network and depth information | |
CN110059728A (en) | RGB-D image vision conspicuousness detection method based on attention model | |
CN103824272B (en) | The face super-resolution reconstruction method heavily identified based on k nearest neighbor | |
CN110211061A (en) | List depth camera depth map real time enhancing method and device neural network based | |
CN112784736B (en) | Character interaction behavior recognition method based on multi-modal feature fusion | |
CN109584290A (en) | A kind of three-dimensional image matching method based on convolutional neural networks | |
CN110246148A (en) | The conspicuousness detection method of multi-modal depth information fusion and attention study | |
CN108416266A (en) | A kind of video behavior method for quickly identifying extracting moving target using light stream | |
CN108389192A (en) | Stereo-picture Comfort Evaluation method based on convolutional neural networks | |
CN109978786A (en) | A kind of Kinect depth map restorative procedure based on convolutional neural networks | |
CN109461177B (en) | Monocular image depth prediction method based on neural network | |
CN109146944A (en) | A kind of space or depth perception estimation method based on the revoluble long-pending neural network of depth | |
CN110263768A (en) | A kind of face identification method based on depth residual error network | |
CN110705566B (en) | Multi-mode fusion significance detection method based on spatial pyramid pool | |
CN113298736B (en) | Face image restoration method based on face pattern | |
CN113449691A (en) | Human shape recognition system and method based on non-local attention mechanism | |
CN112991371B (en) | Automatic image coloring method and system based on coloring overflow constraint | |
CN110852935A (en) | Image processing method for human face image changing with age |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |