CN109934272B - Image matching method based on full convolution network - Google Patents

Image matching method based on full convolution network Download PDF

Info

Publication number
CN109934272B
CN109934272B CN201910154179.9A CN201910154179A CN109934272B CN 109934272 B CN109934272 B CN 109934272B CN 201910154179 A CN201910154179 A CN 201910154179A CN 109934272 B CN109934272 B CN 109934272B
Authority
CN
China
Prior art keywords
image
convolution
layer
output
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201910154179.9A
Other languages
Chinese (zh)
Other versions
CN109934272A (en
Inventor
桑勇
李庆
赵健龙
段富海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN201910154179.9A priority Critical patent/CN109934272B/en
Publication of CN109934272A publication Critical patent/CN109934272A/en
Application granted granted Critical
Publication of CN109934272B publication Critical patent/CN109934272B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention provides an image matching method based on a full convolution network, and belongs to the field of computer vision. The method comprises the following specific steps: the method comprises the steps of firstly, constructing a full convolution image matching network model for image matching, setting model initialization parameters and a loss function, wherein the whole full convolution matching network model consists of 8 parts; secondly, obtaining image data of a training model according to the full convolution image matching network model; thirdly, on GPU equipment, training a full convolution image matching network model by adopting image data; and fourthly, matching the images by using the trained full convolution image matching network model. The method has better matching precision and matching efficiency, solves the problem that the matching accuracy and the matching efficiency cannot be achieved simultaneously, and has more advantages compared with the traditional method.

Description

Image matching method based on full convolution network
Technical Field
The invention belongs to the field of computer vision, and relates to an image matching algorithm based on a full convolution network.
Background
Feature matching is a basic task in computer vision, plays an important role in computer vision, and is mainly applied to tasks of SFM (Structure From motion), base line registration in binocular vision, panorama construction, target identification, and high-level vision.
Before the popularity of data-driven feature learning approaches, this field of research has focused on studying local descriptor design and descriptor matching algorithms. Lowe designs a feature point with better robustness, namely an SIFT feature point [ Lowe D g, reactive image features from scale-innovative keys [ J ]. International journel of computer vision,2004,60 (2): 91-110.]. Starting from this, various manual feature points follow, and SIFT, ORB, SURF, and the like have been used so far. SIFT has the best robustness but the algorithm efficiency is not high, and the application scene with real-time requirement cannot be met; SURF is an approximate method of SIFT, and is improved in algorithm efficiency compared with SIFT, but robustness is slightly weaker than SIFT; the ORB feature points have higher algorithm efficiency but poorer robustness, and more mismatching exists when the features are used for feature matching. The matching of the descriptors is a process of measuring the similarity of the features to be matched (Hamming distance, Euler distance and the like) through a distance function and selecting proper matching points. The characteristic point matching adopts a powerful searching method to match the characteristics, when the number of the characteristic points is large, the powerful searching method has large calculation amount, and the characteristic matching is accelerated by adopting a fast Approximate Nearest Neighbor (ANN) algorithm.
The artificially designed feature descriptors have limitations and cannot simultaneously ensure the robustness and the algorithm efficiency of the descriptors. In recent years, a convolution network obtains excellent performance in high-level tasks such as target detection, image classification and image segmentation, high-performance parallel computing hardware such as a GPU (graphics processing unit), an FPGA (field programmable gate array) and matched software technologies thereof tend to be mature, and in order to solve the problem that image matching accuracy and matching efficiency cannot be achieved at the same time, the invention provides a full convolution network matching method, so that feature matching accuracy and matching speed are improved.
Disclosure of Invention
In order to solve the problem that the matching accuracy and the matching efficiency of the traditional feature matching algorithm cannot coexist, the invention provides an image matching method based on a full convolution network, and the matching accuracy and the matching efficiency of images are improved.
In order to achieve the above object, the invention adopts the technical scheme that:
an image matching method based on a full convolution network comprises the following steps:
step one, constructing a full convolution image matching network model for image matching, setting model initialization parameters and a loss function, wherein the whole full convolution matching network model consists of 8 parts and comprises the following substeps:
1) the first part comprises a convolutional layer and a pooling layer, an image formed by stacking 2-channel image blocks is input, the size of the image is 64x64, the parameter of the convolutional layer is 3x3x2, the number of convolutions is 64, and after the convolved output data is processed by a RELU activation function, a feature map with the size of 64x64x64 is output; the feature map is processed by a maximum pooling layer with a step parameter of 2, and a feature map with the size of 32x32x64 is output;
2) the second part of structure is composed of a convolutional layer and a pooling layer, the characteristic diagram obtained in the step 1) is used as the input of the second part of convolutional layer, the parameter of the convolutional layer is 3x3x64x128, the output data after convolution is processed by a RELU activation function, and the output size is 32x32x 12; the feature map passes through a maximum pooling layer with the step size of 2, and a feature map with the size of 16x16x128 is output;
3) a third partial structure comprising only the convolutional layer, this part being free of pooling layers; the feature map obtained in the step 2) is used as the input of a third part of convolution layer, the parameter of the convolution layer is 3x3x128x128, the output data after convolution is processed by a RELU activation function, and the output size is 16x16x128 feature map;
4) the fourth partial structure only comprises a convolution layer, and the part has no pooling layer; the characteristic diagram obtained in the step 3) is used as the input of the fourth part of the convolutional layer, the parameter of the convolutional layer is 3x3x128x64, the output data after convolution is processed by a RELU activation function, and the output size of the characteristic diagram is 16x16x 64;
5) the fifth part structure consists of a convolution layer and a pooling layer; the characteristic diagram obtained in the step 4) is used as the input of the fifth part of the convolutional layer, the parameter of the convolutional layer is 3x3x64x64, the output data after convolution is processed by a RELU activation function and then processed by the maximum pooling layer with the step length of 2, and the characteristic diagram with the size of 8x8x64 is output;
6) a sixth partial structure comprising only the convolutional layer, the portion being free of the pooling layer; the characteristic diagram obtained in the step 5) is used as the input of the sixth part of the convolutional layer, the parameter of the convolutional layer is 3x3x64x64, the output data after convolution is processed by a RELU activation function, and the output size is 8x8x 64;
7) the seventh part structure is a flattening layer, which has the function of smoothing the feature map with the size of 8x8x64 into a vector with the dimension of 4096x 1;
8) the eighth part is an output layer consisting of a convolution layer with convolution layer parameters of 1x1x4096x2 and a softmax activation function, and the part is the output layer of the whole full convolution matching network and is used for outputting the probability of image matching;
the convolution layer parameters in the steps 1) to 6) and 8) adopt an Xavier initialization mode, and the model training adopts a minimized cross entropy loss function E as an optimization target, which is defined as follows:
Figure GDA0003354189770000031
wherein the content of the first and second substances,
Figure GDA0003354189770000032
xifor the ith value of the model output vector, N is the dimension number of the model output vector, i, j belongs to [0, N ∈]And e represents a natural number,
Figure GDA0003354189770000033
probability value, y, representing the corresponding category of the i-th output vectoriA tag value representing an i-th dimension output vector.
Step two, obtaining image data of a training model according to the full convolution image matching network model constructed in the step one; the method comprises the following substeps:
1) feature points such as sift of the image are detected, and image blocks with the size of 64x64 are taken as training data by taking the pixel position of the sift feature point as the center. In order to obtain matching information of image blocks of different images, a traditional image matching method is required to be used for pre-matching the features in advance to obtain a correct label. This process requires manual verification to ensure the correctness of the label.
2) After the training data and the corresponding labels are obtained, in order to enhance the robustness of the algorithm, the training data needs to be enhanced to obtain final image data. The enhancement treatment specifically comprises the following steps: and exchanging the channel sequence of the two images, rotating one image by 90 degrees and 180 degrees, and adding random noise in the image.
And step three, on GPU equipment, training the full convolution image matching network model of the first step by adopting the image data of the second step to obtain a trained model.
Matching image block pairs in the images to be matched by adopting the trained model; the method comprises the following substeps:
1) detecting sift characteristic points of the matched image, and intercepting image blocks with the size of 64x64 by taking the characteristic points as centers;
2) selecting a plurality of image blocks with the size of 64x64 from the two images to be matched, stacking the image blocks two by two to obtain an image block pair to be predicted, and inputting the image block pair to be predicted into a trained full-convolution image matching network to obtain the matching result of the two images to be matched. If the first picture obtains n image blocks in total and the second picture obtains m image blocks in total, the number of stacked images to be predicted is n × m.
The invention has the beneficial effects that: the method effectively solves the problem that the matching accuracy and the matching efficiency cannot be achieved simultaneously, and has more advantages compared with the traditional method.
Drawings
FIG. 1 is a flow chart of the algorithm of the present invention;
FIG. 2 is a diagram of a full convolution matching network in accordance with the present invention;
table 1 is a table of parameters for a full convolution matching network in the present invention.
Detailed Description
The invention is described in further detail below with reference to the following figures and detailed description:
the image matching method based on the full convolution network comprises the following steps:
the method comprises the following steps: designing a full convolution image matching network, and setting model initialization parameters and a loss function;
constructing a full convolution image matching network model, wherein the model comprises 8 parts: the first part comprises a convolutional layer and a pooling layer, an image formed by stacking 2-channel image blocks is input, the size of the image is 64x64, the parameter of the convolutional layer is 3x3x2, the number of convolutions is 64, the convolved data is processed by a RELU activation function, the size of an output characteristic diagram is 64x64x64, and then the characteristic diagram is processed by a maximum pooling layer with the step size parameter of 2 to obtain a characteristic diagram of 32x32x 64;
the structure of the second part of the full convolution matching network model is similar to that of the first part, the convolution parameter is 3x3x64x128, convolution is carried out, then the RELU activation function is carried out to process convolution output, the output feature map is 32x32x128, and then the feature map is subjected to the maximum pooling layer with the step length of 2 to obtain a feature map with the size of 16x16x 128;
the structure of the third part of the full convolution matching network model is slightly different from the structures of the first part and the second part, the part does not have a pooling layer, the parameter of the convolution layer of the part is 3x3x128x128, the activation function is a RELU activation function, and the size of the feature map output after the third part of processing is 16x16x 128;
the fourth part of the full convolution matching network has the structure that the convolution parameter is 3x3x128x64, the activation function is the RELU activation function, and the size of the output feature map is 16x16x 64;
the fifth part of the structure of the full convolution matching network is that the convolution parameter is 3x3x64x64, the activation function is a RELU activation function, the size of the output feature map is 8x8x64 after the maximum pooling layer processing with the step length of 2;
the sixth part of the structure of the full convolution matching network is that the convolution parameter is 3x3x64x64, the activation function is the RELU activation function, and the size of the output feature map is 8x8x 64;
the seventh partial structure of the full convolution matching network is a flattening layer, and the function of the layer is to smooth the feature map with the size of 8x8x64 into a vector with the dimension of 4096x 1;
the eighth part of the full convolution matching network is an output layer consisting of a convolution layer with the parameter of 1x1x4096x2 and a softmax activation function, and the eighth part is the output layer of the whole full convolution matching network and outputs the probability of image matching;
2) initializing each convolutional layer parameter in the network by adopting an Xavier initialization mode, training the model by taking a minimized cross entropy loss function as an optimization target, wherein the minimized cross entropy loss function is defined as follows:
Figure GDA0003354189770000061
wherein
Figure GDA0003354189770000062
xiThe ith value of the model output vector is represented, and N is the dimension number of the model output vector.
Step two: preparing image data of a training model according to the designed full convolution network model;
1) detecting the feature points such as sift, taking the pixel position of the feature point of sift as the center, taking the image block with the size of 64x64 as a training set, in order to obtain the matching information of the image blocks of different images, pre-matching the features by using a traditional image matching method in advance, manually verifying after matching, and ensuring the correctness of the label. For matching pairs of image blocks, we stack them as one 2-channel image and number the images, with the label that should have the same number as the image, 1 if the pair matches, and 0 otherwise.
2) After the training data and the corresponding labels are obtained, in order to enhance the robustness of the algorithm, the data needs to be subjected to light enhancement processing, and the method mainly performs the following light enhancement processing on the data: and exchanging the channel sequence of the two images, rotating one image by 90 degrees and 180 degrees, and adding random noise in the image.
Step three: on GPU equipment, training a full convolution matching network model by using prepared image data;
1) training the model by adopting a stochastic gradient descent optimization algorithm, setting the initial learning rate to be 0.1, setting the learning rate of the model to be reduced by 10 times every 10000 times of iteration, setting the maximum iteration number to be 30000, and setting the number of single iteration input images to be 128.
Matching image block pairs in the images to be matched by adopting the trained model; the method comprises the following substeps:
1) detecting sift characteristic points of the matched image, and intercepting image blocks with the size of 64x64 by taking the characteristic points as centers;
2) selecting a plurality of image blocks with the size of 64x64 from the two images to be matched, stacking the image blocks two by two to obtain an image block pair to be predicted, and inputting the image block pair to be predicted into a trained full-convolution image matching network to obtain the matching result of the two images to be matched. For example, if the first picture obtains n image blocks in total and the second picture obtains m image blocks in total, the number of stacked images to be predicted is n × m.
TABLE 1
Figure GDA0003354189770000071
The above-mentioned embodiments only express the embodiments of the present invention, but not should be understood as the limitation of the scope of the invention patent, it should be noted that, for those skilled in the art, many variations and modifications can be made without departing from the concept of the present invention, and these all fall into the protection scope of the present invention.

Claims (2)

1. An image matching method based on a full convolution network is characterized by comprising the following steps:
step one, constructing a full convolution image matching network model for image matching, setting model initialization parameters and a loss function, wherein the whole full convolution matching network model consists of 8 parts and comprises the following substeps:
1) the first part comprises a convolutional layer and a pooling layer, an image formed by stacking 2-channel image blocks is input, the size of the image is 64x64, the parameter of the convolutional layer is 3x3x2, the number of convolutions is 64, and after the convolved output data is processed by a RELU activation function, a feature map with the size of 64x64x64 is output; the feature map is processed by a maximum pooling layer with a step parameter of 2, and a feature map with the size of 32x32x64 is output;
2) the second part of structure is composed of a convolutional layer and a pooling layer, the characteristic diagram obtained in the step 1) is used as the input of the second part of convolutional layer, the parameter of the convolutional layer is 3x3x64x128, the output data after convolution is processed by a RELU activation function, and the output size is 32x32x 12; the feature map passes through a maximum pooling layer with the step size of 2, and a feature map with the size of 16x16x128 is output;
3) a third partial structure comprising only the convolutional layer, this part being free of pooling layers; the feature map obtained in the step 2) is used as the input of a third part of convolution layer, the parameter of the convolution layer is 3x3x128x128, the output data after convolution is processed by a RELU activation function, and the output size is 16x16x128 feature map;
4) the fourth partial structure only comprises a convolution layer, and the part has no pooling layer; the characteristic diagram obtained in the step 3) is used as the input of the fourth part of the convolutional layer, the parameter of the convolutional layer is 3x3x128x64, the output data after convolution is processed by a RELU activation function, and the output size of the characteristic diagram is 16x16x 64;
5) the fifth part structure consists of a convolution layer and a pooling layer; the characteristic diagram obtained in the step 4) is used as the input of the fifth part of the convolutional layer, the parameter of the convolutional layer is 3x3x64x64, the output data after convolution is processed by a RELU activation function and then processed by the maximum pooling layer with the step length of 2, and the characteristic diagram with the size of 8x8x64 is output;
6) a sixth partial structure comprising only the convolutional layer, the portion being free of the pooling layer; the characteristic diagram obtained in the step 5) is used as the input of the sixth part of the convolutional layer, the parameter of the convolutional layer is 3x3x64x64, the output data after convolution is processed by a RELU activation function, and the output size is 8x8x 64;
7) the seventh part structure is a flattening layer, which has the function of smoothing the feature map with the size of 8x8x64 into a vector with the dimension of 4096x 1;
8) the eighth part is an output layer consisting of a convolution layer with convolution layer parameters of 1x1x4096x2 and a softmax activation function, and the part is the output layer of the whole full convolution matching network and is used for outputting the probability of image matching;
step two, obtaining image data of a training model according to the full convolution image matching network model constructed in the step one; the method comprises the following substeps:
1) detecting the sift characteristic points of the image, and taking the pixel position of the sift characteristic points as the center, and taking the image blocks with the size of 64x64 as training data; in order to obtain matching information of image blocks of different images, the features need to be pre-matched by using a traditional image matching method in advance to obtain correct labels; the process needs manual verification to ensure the correctness of the label;
2) after the training data and the corresponding labels are obtained, in order to enhance the robustness of the algorithm, the training data needs to be enhanced to obtain final image data; the enhancement treatment specifically comprises the following steps: exchanging the channel sequence of the two images, rotating one image by 90 degrees and 180 degrees, and adding random noise in the image;
step three, on GPU equipment, training the full convolution image matching network model of the first step by adopting image data of the second step to obtain a trained model;
matching image block pairs in the images to be matched by adopting the trained model; the method comprises the following substeps:
1) detecting sift characteristic points of the matched image, and intercepting image blocks with the size of 64x64 by taking the characteristic points as centers;
2) selecting a plurality of image blocks with the size of 64x64 from the two images to be matched, stacking the image blocks two by two to obtain an image block pair to be predicted, and inputting the image block pair to be predicted into a trained full-convolution image matching network to obtain the matching result of the two images to be matched.
2. The image matching method based on the full convolution network according to claim 1, wherein the convolution layer parameters in the first step from step 1) to step 6) and in the step 8) are initialized by Xavier, and the model training uses a minimum cross entropy loss function E as an optimization target, which is defined as follows:
Figure FDA0003354189760000031
wherein the content of the first and second substances,
Figure FDA0003354189760000032
xifor the ith value of the model output vector, N is the dimension number of the model output vector, i, j belongs to [0, N ∈]And e represents a natural number,
Figure FDA0003354189760000033
probability value, y, representing the corresponding category of the i-th output vectoriA tag value representing an i-th dimension output vector.
CN201910154179.9A 2019-03-01 2019-03-01 Image matching method based on full convolution network Expired - Fee Related CN109934272B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910154179.9A CN109934272B (en) 2019-03-01 2019-03-01 Image matching method based on full convolution network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910154179.9A CN109934272B (en) 2019-03-01 2019-03-01 Image matching method based on full convolution network

Publications (2)

Publication Number Publication Date
CN109934272A CN109934272A (en) 2019-06-25
CN109934272B true CN109934272B (en) 2022-03-29

Family

ID=66986239

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910154179.9A Expired - Fee Related CN109934272B (en) 2019-03-01 2019-03-01 Image matching method based on full convolution network

Country Status (1)

Country Link
CN (1) CN109934272B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111199558A (en) * 2019-12-25 2020-05-26 北京自行者科技有限公司 Image matching method based on deep learning
CN111724424B (en) * 2020-06-24 2024-05-14 上海应用技术大学 Image registration method
CN111812732B (en) * 2020-06-29 2024-03-15 中铁二院工程集团有限责任公司 Geoelectromagnetic nonlinear inversion method based on convolutional neural network
CN111951319A (en) * 2020-08-21 2020-11-17 清华大学深圳国际研究生院 Image stereo matching method
CN113128518B (en) * 2021-03-30 2023-04-07 西安理工大学 Sift mismatch detection method based on twin convolution network and feature mixing

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108009524A (en) * 2017-12-25 2018-05-08 西北工业大学 A kind of method for detecting lane lines based on full convolutional network
CN108960258A (en) * 2018-07-06 2018-12-07 江苏迪伦智能科技有限公司 A kind of template matching method based on self study depth characteristic

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10204299B2 (en) * 2015-11-04 2019-02-12 Nec Corporation Unsupervised matching in fine-grained datasets for single-view object reconstruction

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108009524A (en) * 2017-12-25 2018-05-08 西北工业大学 A kind of method for detecting lane lines based on full convolutional network
CN108960258A (en) * 2018-07-06 2018-12-07 江苏迪伦智能科技有限公司 A kind of template matching method based on self study depth characteristic

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A Deep Metric for Multimodal Registration;Martin Simonovsky etal.;《http:arXiv:1609.05396v1》;20160917;全文 *
卫星影像匹配的深度卷积神经网络方法;范大昭等;《测绘学报》;20180630;第47卷(第6期);全文 *

Also Published As

Publication number Publication date
CN109934272A (en) 2019-06-25

Similar Documents

Publication Publication Date Title
CN109934272B (en) Image matching method based on full convolution network
CN108171701B (en) Significance detection method based on U network and counterstudy
CN111753828B (en) Natural scene horizontal character detection method based on deep convolutional neural network
AU2020104423A4 (en) Multi-View Three-Dimensional Model Retrieval Method Based on Non-Local Graph Convolutional Network
CN112288011B (en) Image matching method based on self-attention deep neural network
CN104036012B (en) Dictionary learning, vision bag of words feature extracting method and searching system
CN111898621B (en) Contour shape recognition method
CN110751027B (en) Pedestrian re-identification method based on deep multi-instance learning
CN104778476A (en) Image classification method
CN110766708A (en) Image comparison method based on contour similarity
Tripathi et al. Real time object detection using CNN
Salem et al. Semantic image inpainting using self-learning encoder-decoder and adversarial loss
CN113255604B (en) Pedestrian re-identification method, device, equipment and medium based on deep learning network
CN111523586A (en) Noise-aware-based full-network supervision target detection method
CN106951501B (en) Three-dimensional model retrieval method based on multi-graph matching
Zhang et al. Consecutive convolutional activations for scene character recognition
CN111597367A (en) Three-dimensional model retrieval method based on view and Hash algorithm
CN111062274A (en) Context-aware embedded crowd counting method, system, medium, and electronic device
CN111144469A (en) End-to-end multi-sequence text recognition method based on multi-dimensional correlation time sequence classification neural network
CN110738194A (en) three-dimensional object identification method based on point cloud ordered coding
CN107358200B (en) Multi-camera non-overlapping vision field pedestrian matching method based on sparse learning
CN110555462A (en) non-fixed multi-character verification code identification method based on convolutional neural network
CN113160291B (en) Change detection method based on image registration
CN113011506B (en) Texture image classification method based on deep fractal spectrum network
CN115063831A (en) High-performance pedestrian retrieval and re-identification method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20220329

CF01 Termination of patent right due to non-payment of annual fee