CN110070574B - Binocular vision stereo matching method based on improved PSMAT net - Google Patents

Binocular vision stereo matching method based on improved PSMAT net Download PDF

Info

Publication number
CN110070574B
CN110070574B CN201910354039.6A CN201910354039A CN110070574B CN 110070574 B CN110070574 B CN 110070574B CN 201910354039 A CN201910354039 A CN 201910354039A CN 110070574 B CN110070574 B CN 110070574B
Authority
CN
China
Prior art keywords
feature
image
spp
module
parallax
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910354039.6A
Other languages
Chinese (zh)
Other versions
CN110070574A (en
Inventor
秦岭
黄庆
雷波
程遥
张�杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Maitewei Wuhan Technology Co ltd
Original Assignee
Maitewei Wuhan Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Maitewei Wuhan Technology Co ltd filed Critical Maitewei Wuhan Technology Co ltd
Priority to CN201910354039.6A priority Critical patent/CN110070574B/en
Publication of CN110070574A publication Critical patent/CN110070574A/en
Application granted granted Critical
Publication of CN110070574B publication Critical patent/CN110070574B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20228Disparity calculation for image-based rendering
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a binocular vision stereo matching method based on an improved PSMAT, which comprises the following steps: feature extraction is carried out on left and right images by adopting a dimension reduction starting module, feature images are respectively obtained, the obtained feature images are input into an SPP module, the SPP module carries out up-sampling after compressing the feature images, the feature images with different levels are synthesized into a final SPP feature image, each parallax value in the left and right images is combined, the feature image corresponding to each parallax value and the SPP feature image form a four-dimensional matching cost volume, a three-dimensional convolution module aggregates environment information, the final parallax image is obtained through up-sampling and parallax regression, and a prediction cost C is obtained according to normalized exponential function operation d To calculate the likelihood of each disparity, optimized using an identity map, and predicted disparity values are summed from each disparity value and its corresponding likelihood.

Description

Binocular vision stereo matching method based on improved PSMAT net
Technical Field
The invention relates to the technical field of visual stereo algorithms, in particular to a binocular visual stereo matching method based on improved PSMAT.
Background
Through years of development, binocular stereo vision has played an important role in the fields of three-dimensional reconstruction, industrial measurement, unmanned driving and the like. Stereo matching is the research core content of binocular vision and is also a research difficulty of binocular vision. Up to now, the conventional binocular vision stereo matching method is mainly divided into the following three types: global matching, local matching, and semi-global matching. The global matching generally comprises matching cost calculation, parallax calculation and parallax optimization, and the core of the global matching method is that a global energy function is constructed and minimized, so that an optimal parallax image is obtained; the global matching method has good results, but the general running time is long, and the method is not suitable for real-time running.
In recent years, more and more methods for stereo matching are performed by using convolutional neural networks. CNNs were used earlier to solve the problem of matching consistency by similarity calculation, which calculates the similarity of a pair of tiles to determine if they match. Although the CNN-based stereo matching method has some improvement in speed and accuracy compared to the conventional binocular vision stereo matching method, the performance in an uncomfortable region (e.g., a blocking region, a parallax discontinuous region, a weak texture region, a reflective surface, etc.) is still not ideal.
Disclosure of Invention
The invention provides a binocular vision stereo matching method based on an improved PSMAT (power grid network) to solve the problems in the background technology.
The invention provides a binocular vision stereo matching method based on an improved PSMAT, which comprises the following steps:
s1: extracting the characteristics of the left image and the right image by adopting a dimension reduction starting module to respectively obtain characteristic diagrams;
s2: inputting the obtained feature images into an SPP module, carrying out up-sampling after compressing the feature images by the SPP module, and synthesizing the feature images of different levels into a final SPP feature image, wherein the SPP feature image is generated by the following steps:
A. selecting one piece of characteristic information as a basis, and extracting a characteristic connection value on the characteristic information;
B. searching for feature information, searching for feature information which can be matched with the feature connection value from the acquired feature information according to the feature connection value on the basic feature information, and connecting to generate larger basic feature information;
C. extracting new feature connection values again on newly generated basic feature information, searching feature information which can be matched with the new feature connection values in the acquired feature information for connection, and sequentially searching for matching;
D. finally, forming a final SPP characteristic diagram;
s3: combining each parallax value in the left and right images, and forming a four-dimensional matching cost volume by the feature image corresponding to each parallax value and the SPP feature image;
s4: the three-dimensional convolution module aggregates the environmental information, obtains a final parallax image through up-sampling and parallax regression, calculates the possibility of each parallax according to the prediction cost Cd obtained through normalized exponential function operation, optimizes by using an identity mapping, and obtains a predicted parallax value through summation of each parallax value and the corresponding possibility.
Preferably, in the step S1, the start module performs image acquisition by scanning the image to obtain a data image, and extracts the feature data according to the feature of the feature data on the image data.
Preferably, in the step S2, in the up-sampling process, a method based on the edge of the original low-resolution image is adopted, the edge of the low-resolution image is detected first, then the pixels are classified according to the detected edge, the low-resolution image is interpolated by adopting a traditional method, then the edge of the high-resolution image is detected, and finally the edge and the pixels nearby are subjected to special processing to remove the blur and enhance the edge of the image.
Preferably, in the step S2, the SPP module samples the feature map, the SPP module is connected with each computing module, and after the SPP module collects the feature map, each computing module can extract feature information of the feature map from the SPP module.
Preferably, in the step S3, the feature map of the disparity value is matched with the SPP feature map through feature data on the feature value, direction feature data is set on the SPP feature map, and the direction features are matched with each other to form a multi-dimensional cost volume.
Preferably, in the step S4, the convolution module adopts a 1×1 convolution module, and the convolution module can effectively reduce the dimension of the thickness of the feature map, so that the width of the network can be increased and the adaptability of the network to multiple scales can be increased without increasing network parameters, and the matching precision is improved.
Preferably, in the step S4, the normalization exponential function accelerates the convergence speed of the network training, and simultaneously, the normalization enables the training to use a higher learning rate without too many initialization operations, and in combination with other network optimization operations, the test time is reduced when the image is tested.
Preferably, in S4, the summation formula is:
Figure GDA0004068023060000031
training with the Focal loss function, the loss function is defined as follows:
Figure GDA0004068023060000032
wherein:
Figure GDA0004068023060000033
FL(x)=-αx γ log (1-x), where d is the group-trunk disparity value,
Figure GDA0004068023060000034
is the predicted disparity value.
The binocular vision stereo matching method based on the improved PSMAT has the beneficial effects that:
1. through the revealing initiation module of dimension reduction, feature extraction can be better carried out.
2. By adding a corresponding normalization layer to each layer, the training can use a larger learning rate, and the convergence speed of the network training is accelerated.
3. Improving loss function, ensuring matching accuracy and improving matching speed
Detailed Description
The invention will be further illustrated with reference to specific examples.
The invention provides a binocular vision stereo matching method based on an improved PSMAT, which comprises the following steps:
s1: the method comprises the steps that a dimension reduction starting module is adopted to conduct feature extraction on left and right images to obtain feature images respectively, the starting module is used for collecting the images to scan the images to obtain data images, and feature data are extracted according to the characteristics of feature data on the image data;
s2: inputting the obtained feature images into an SPP module, compressing the feature images by the SPP module, then up-sampling, in the up-sampling process, adopting a method based on the edges of original low-resolution images, firstly detecting the edges of the low-resolution images, then classifying and processing pixels according to the detected edges, interpolating the low-resolution images by adopting a traditional method, then detecting the edges of high-resolution images, finally performing special processing on the edges and nearby pixels to remove blurring and strengthen the edges of the images, synthesizing the feature images of different levels into a final SPP feature image, sampling the feature images by the SPP module, connecting the SPP module with each calculation module, and after the SPP module collects the feature images, extracting the feature information of the feature images by each calculation module, wherein the SPP feature image is generated by the SPP module:
A. selecting one piece of characteristic information as a basis, and extracting a characteristic connection value on the characteristic information;
B. searching for feature information, searching for feature information which can be matched with the feature connection value from the acquired feature information according to the feature connection value on the basic feature information, and connecting to generate larger basic feature information;
C. extracting new feature connection values again on newly generated basic feature information, searching feature information which can be matched with the new feature connection values in the acquired feature information for connection, and sequentially searching for matching;
D. finally, forming a final SPP characteristic diagram;
s3: combining each parallax value in the left and right images, forming a four-dimensional matching cost volume by the feature image corresponding to each parallax value and the SPP feature image, matching the feature image of the parallax value with the SPP feature image through feature data on the feature values, setting direction feature data on the SPP feature image, and mutually matching the direction features to form a multi-dimensional cost volume;
s4: the three-dimensional convolution module aggregates environment information, the convolution module adopts a 1 multiplied by 1 convolution module, and the convolution module can effectively reduce the dimension of the thickness of the feature map, so that the width of a network can be increased, the adaptability of the network to multiple scales can be increased, and the matching precision can be improved under the condition that the network parameters are not increased And by upsampling and parallax backObtaining a final parallax image, calculating the possibility of each parallax according to the prediction cost Cd obtained by the operation of a normalized exponential function, wherein the normalized exponential function accelerates the convergence speed of network training, and simultaneously, the normalization enables the training to use a higher learning rate without too many initialization operations, and combines other network optimization operations, so that the test time is reduced when the image is tested The identity mapping is used for optimization, the identity mapping can have good effect of optimization, the calculation speed can be greatly increased, and the predicted parallax value is obtained by summing each parallax value and the corresponding possibility The summation formula is:
Figure GDA0004068023060000051
training with the Focal loss function, the loss function is defined as follows:
Figure GDA0004068023060000052
wherein:
Figure GDA0004068023060000053
FL(x)=-αx γ log (1-x), where d is the group-trunk disparity value,
Figure GDA0004068023060000054
is the predicted disparity value.
The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should make equivalent substitutions or modifications according to the technical scheme of the present invention and the inventive concept thereof, and should be covered by the scope of the present invention.

Claims (7)

1. The binocular vision stereo matching method based on the improved PSMAT is characterized by comprising the following steps of:
s1: extracting the characteristics of the left image and the right image by adopting a dimension reduction starting module to respectively obtain characteristic diagrams;
s2: inputting the obtained feature images into an SPP module, carrying out up-sampling after compressing the feature images by the SPP module, and synthesizing the feature images of different levels into a final SPP feature image, wherein the SPP feature image is generated by the following steps:
A. selecting one piece of characteristic information as a basis, and extracting a characteristic connection value on the characteristic information;
B. searching for feature information, searching for feature information which can be matched with the feature connection value from the acquired feature information according to the feature connection value on the basic feature information, and connecting to generate larger basic feature information;
C. extracting new feature connection values again on newly generated basic feature information, searching feature information which can be matched with the new feature connection values in the acquired feature information for connection, and sequentially searching for matching;
D. finally, forming a final SPP characteristic diagram;
s3: combining each parallax value in the left and right images, and forming a four-dimensional matching cost volume by the feature image corresponding to each parallax value and the SPP feature image;
s4: the three-dimensional convolution module aggregates the environmental information, obtains a final parallax image through up-sampling and parallax regression, and obtains a prediction cost C according to the operation of a normalized exponential function d Calculating the probability of each parallax, optimizing by using an identity mapping, and summing the predicted parallax values from each parallax value and the corresponding probability, wherein the summation formula is as follows:
Figure FDA0004068023040000011
training with the Focal loss function, the loss function is defined as follows:
Figure FDA0004068023040000012
wherein:
Figure FDA0004068023040000013
FL(x)=-αx γ log (1-x), where d is the group-trunk disparity value,
Figure FDA0004068023040000021
is the predicted disparity value.
2. The binocular vision stereo matching method based on the improved PSMNet of claim 1, wherein in S1, the starting module performs image acquisition by scanning the image first to obtain a data image, and extracts the feature data according to the feature of the feature data on the image data.
3. The binocular vision stereo matching method based on the improved PSMNet according to claim 1, wherein in the step S2, a method based on the edge of the original low resolution image is adopted in the up-sampling process, the edge of the low resolution image is detected first, then the pixels are classified according to the detected edge, the low resolution image is interpolated by adopting a traditional method, then the edge of the high resolution image is detected, and finally the edge and the nearby pixels are processed specially to remove the blur and enhance the edge of the image.
4. The binocular vision stereo matching method based on the improved PSMNet according to claim 1, wherein in S2, the SPP module samples the feature map, the SPP module is connected with each computing module, and after the SPP module collects the feature map, each computing module can extract feature information of the feature map from the SPP module.
5. The binocular vision stereo matching method based on the improved PSMANet according to claim 1, wherein in the step S3, the feature map of the disparity value is matched with the SPP feature map through feature data on the feature values, direction feature data are arranged on the SPP feature map, and direction features are matched with each other to form a multi-dimensional cost volume.
6. The binocular vision stereo matching method based on the improved PSMAT is characterized in that in the S4, a convolution module adopts a 1X 1 convolution module, and the convolution module can effectively reduce the dimension of the thickness of the feature map, so that the width of a network can be increased, the adaptability of the network to multiple scales can be increased, and the matching precision can be improved under the condition that network parameters are not increased.
7. The binocular vision stereo matching method based on the improved PSMNet according to claim 1, wherein in S4, the normalization exponential function accelerates the convergence speed of the network training, and simultaneously, the normalization enables the training to use a higher learning rate without too many initialization operations, and in combination with other network optimization operations, the test time is reduced when the image is tested.
CN201910354039.6A 2019-04-29 2019-04-29 Binocular vision stereo matching method based on improved PSMAT net Active CN110070574B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910354039.6A CN110070574B (en) 2019-04-29 2019-04-29 Binocular vision stereo matching method based on improved PSMAT net

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910354039.6A CN110070574B (en) 2019-04-29 2019-04-29 Binocular vision stereo matching method based on improved PSMAT net

Publications (2)

Publication Number Publication Date
CN110070574A CN110070574A (en) 2019-07-30
CN110070574B true CN110070574B (en) 2023-05-02

Family

ID=67369599

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910354039.6A Active CN110070574B (en) 2019-04-29 2019-04-29 Binocular vision stereo matching method based on improved PSMAT net

Country Status (1)

Country Link
CN (1) CN110070574B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111402129B (en) * 2020-02-21 2022-03-01 西安交通大学 Binocular stereo matching method based on joint up-sampling convolutional neural network
CN111583313A (en) * 2020-03-25 2020-08-25 上海物联网有限公司 Improved binocular stereo matching method based on PSmNet
CN111405266B (en) * 2020-05-29 2020-09-11 深圳看到科技有限公司 Binocular image rapid processing method and device and corresponding storage medium
CN115239783A (en) * 2021-04-23 2022-10-25 中兴通讯股份有限公司 Parallax estimation method, parallax estimation device, image processing apparatus, and storage medium
CN112991422A (en) * 2021-04-27 2021-06-18 杭州云智声智能科技有限公司 Stereo matching method and system based on void space pyramid pooling
CN114648669A (en) * 2022-05-20 2022-06-21 华中科技大学 Motor train unit fault detection method and system based on domain-adaptive binocular parallax calculation

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106340036A (en) * 2016-08-08 2017-01-18 东南大学 Binocular stereoscopic vision-based stereo matching method
CN109146937A (en) * 2018-08-22 2019-01-04 广东电网有限责任公司 A kind of electric inspection process image dense Stereo Matching method based on deep learning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106340036A (en) * 2016-08-08 2017-01-18 东南大学 Binocular stereoscopic vision-based stereo matching method
CN109146937A (en) * 2018-08-22 2019-01-04 广东电网有限责任公司 A kind of electric inspection process image dense Stereo Matching method based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Pyramid Stereo Matching Network;Jia-Ren Chang 等;《2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition》;20181231;第5410-5418页 *
基于金字塔变换跨尺度代价聚合的立体匹配;姚莉等;《***仿真学报》;20160908;第28卷(第09期);第2227-2234页 *

Also Published As

Publication number Publication date
CN110070574A (en) 2019-07-30

Similar Documents

Publication Publication Date Title
CN110070574B (en) Binocular vision stereo matching method based on improved PSMAT net
CN112329800B (en) Salient object detection method based on global information guiding residual attention
CN111028146B (en) Image super-resolution method for generating countermeasure network based on double discriminators
CN109598754B (en) Binocular depth estimation method based on depth convolution network
CN109872305B (en) No-reference stereo image quality evaluation method based on quality map generation network
CN109523470B (en) Depth image super-resolution reconstruction method and system
CN109255358B (en) 3D image quality evaluation method based on visual saliency and depth map
CN110060286B (en) Monocular depth estimation method
CN110969124A (en) Two-dimensional human body posture estimation method and system based on lightweight multi-branch network
CN108765414B (en) No-reference stereo image quality evaluation method based on wavelet decomposition and natural scene statistics
CN106462771A (en) 3D image significance detection method
CN111899168B (en) Remote sensing image super-resolution reconstruction method and system based on feature enhancement
CN110751612A (en) Single image rain removing method of multi-channel multi-scale convolution neural network
CN110827312B (en) Learning method based on cooperative visual attention neural network
CN108932699B (en) Three-dimensional matching harmonic filtering image denoising method based on transform domain
CN112819910A (en) Hyperspectral image reconstruction method based on double-ghost attention machine mechanism network
CN111626927B (en) Binocular image super-resolution method, system and device adopting parallax constraint
CN110111288B (en) Image enhancement and blind image quality evaluation network system based on deep assisted learning
CN114549308B (en) Image super-resolution reconstruction method and system with large receptive field and oriented to perception
CN113343822B (en) Light field saliency target detection method based on 3D convolution
CN114266957A (en) Hyperspectral image super-resolution restoration method based on multi-degradation mode data augmentation
CN117576402B (en) Deep learning-based multi-scale aggregation transducer remote sensing image semantic segmentation method
CN112801141B (en) Heterogeneous image matching method based on template matching and twin neural network optimization
CN107358625B (en) SAR image change detection method based on SPP Net and region-of-interest detection
CN112329662B (en) Multi-view saliency estimation method based on unsupervised learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: Workstation 501-F028, Tiancheng Information Building, No. 88 Tiancheng Road South, High Speed Rail New City, Xiangcheng District, Suzhou City, Jiangsu Province, 215000 (cluster registration)

Applicant after: Youle Circle (Suzhou) Technology Co.,Ltd.

Address before: 430000, Room 610, 6th Floor, Wuhan University Student Entrepreneurship Park, No. 147 Luoshi Road, Hongshan District, Wuhan City, Hubei Province, China

Applicant before: Youle Circle (Wuhan) Technology Co.,Ltd.

CB02 Change of applicant information
TA01 Transfer of patent application right

Effective date of registration: 20230410

Address after: Hubei University Student Innovation and Entrepreneurship Club, No. 122 Luoshi Road, Hongshan District, Wuhan City, Hubei Province, 430000 (Hubei Jingchu Maker Coffee Station No. 156)

Applicant after: Maitewei (Wuhan) Technology Co.,Ltd.

Address before: Workstation 501-F028, Tiancheng Information Building, No. 88 Tiancheng Road South, High Speed Rail New City, Xiangcheng District, Suzhou City, Jiangsu Province, 215000 (cluster registration)

Applicant before: Youle Circle (Suzhou) Technology Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant