CN109934272B - Image matching method based on full convolution network - Google Patents
Image matching method based on full convolution network Download PDFInfo
- Publication number
- CN109934272B CN109934272B CN201910154179.9A CN201910154179A CN109934272B CN 109934272 B CN109934272 B CN 109934272B CN 201910154179 A CN201910154179 A CN 201910154179A CN 109934272 B CN109934272 B CN 109934272B
- Authority
- CN
- China
- Prior art keywords
- image
- convolution
- layer
- output
- matching
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000012549 training Methods 0.000 claims abstract description 20
- 230000004913 activation Effects 0.000 claims description 25
- 238000011176 pooling Methods 0.000 claims description 23
- 238000010586 diagram Methods 0.000 claims description 16
- 238000005457 optimization Methods 0.000 claims description 4
- 230000008569 process Effects 0.000 claims description 4
- 108091006146 Channels Proteins 0.000 claims description 3
- 238000009499 grossing Methods 0.000 claims description 2
- 239000000126 substance Substances 0.000 claims description 2
- 238000012795 verification Methods 0.000 claims description 2
- 230000006870 function Effects 0.000 description 26
- 238000012545 processing Methods 0.000 description 5
- 238000013461 design Methods 0.000 description 2
- 241000282320 Panthera leo Species 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Landscapes
- Image Analysis (AREA)
Abstract
The invention provides an image matching method based on a full convolution network, and belongs to the field of computer vision. The method comprises the following specific steps: the method comprises the steps of firstly, constructing a full convolution image matching network model for image matching, setting model initialization parameters and a loss function, wherein the whole full convolution matching network model consists of 8 parts; secondly, obtaining image data of a training model according to the full convolution image matching network model; thirdly, on GPU equipment, training a full convolution image matching network model by adopting image data; and fourthly, matching the images by using the trained full convolution image matching network model. The method has better matching precision and matching efficiency, solves the problem that the matching accuracy and the matching efficiency cannot be achieved simultaneously, and has more advantages compared with the traditional method.
Description
Technical Field
The invention belongs to the field of computer vision, and relates to an image matching algorithm based on a full convolution network.
Background
Feature matching is a basic task in computer vision, plays an important role in computer vision, and is mainly applied to tasks of SFM (Structure From motion), base line registration in binocular vision, panorama construction, target identification, and high-level vision.
Before the popularity of data-driven feature learning approaches, this field of research has focused on studying local descriptor design and descriptor matching algorithms. Lowe designs a feature point with better robustness, namely an SIFT feature point [ Lowe D g, reactive image features from scale-innovative keys [ J ]. International journel of computer vision,2004,60 (2): 91-110.]. Starting from this, various manual feature points follow, and SIFT, ORB, SURF, and the like have been used so far. SIFT has the best robustness but the algorithm efficiency is not high, and the application scene with real-time requirement cannot be met; SURF is an approximate method of SIFT, and is improved in algorithm efficiency compared with SIFT, but robustness is slightly weaker than SIFT; the ORB feature points have higher algorithm efficiency but poorer robustness, and more mismatching exists when the features are used for feature matching. The matching of the descriptors is a process of measuring the similarity of the features to be matched (Hamming distance, Euler distance and the like) through a distance function and selecting proper matching points. The characteristic point matching adopts a powerful searching method to match the characteristics, when the number of the characteristic points is large, the powerful searching method has large calculation amount, and the characteristic matching is accelerated by adopting a fast Approximate Nearest Neighbor (ANN) algorithm.
The artificially designed feature descriptors have limitations and cannot simultaneously ensure the robustness and the algorithm efficiency of the descriptors. In recent years, a convolution network obtains excellent performance in high-level tasks such as target detection, image classification and image segmentation, high-performance parallel computing hardware such as a GPU (graphics processing unit), an FPGA (field programmable gate array) and matched software technologies thereof tend to be mature, and in order to solve the problem that image matching accuracy and matching efficiency cannot be achieved at the same time, the invention provides a full convolution network matching method, so that feature matching accuracy and matching speed are improved.
Disclosure of Invention
In order to solve the problem that the matching accuracy and the matching efficiency of the traditional feature matching algorithm cannot coexist, the invention provides an image matching method based on a full convolution network, and the matching accuracy and the matching efficiency of images are improved.
In order to achieve the above object, the invention adopts the technical scheme that:
an image matching method based on a full convolution network comprises the following steps:
step one, constructing a full convolution image matching network model for image matching, setting model initialization parameters and a loss function, wherein the whole full convolution matching network model consists of 8 parts and comprises the following substeps:
1) the first part comprises a convolutional layer and a pooling layer, an image formed by stacking 2-channel image blocks is input, the size of the image is 64x64, the parameter of the convolutional layer is 3x3x2, the number of convolutions is 64, and after the convolved output data is processed by a RELU activation function, a feature map with the size of 64x64x64 is output; the feature map is processed by a maximum pooling layer with a step parameter of 2, and a feature map with the size of 32x32x64 is output;
2) the second part of structure is composed of a convolutional layer and a pooling layer, the characteristic diagram obtained in the step 1) is used as the input of the second part of convolutional layer, the parameter of the convolutional layer is 3x3x64x128, the output data after convolution is processed by a RELU activation function, and the output size is 32x32x 12; the feature map passes through a maximum pooling layer with the step size of 2, and a feature map with the size of 16x16x128 is output;
3) a third partial structure comprising only the convolutional layer, this part being free of pooling layers; the feature map obtained in the step 2) is used as the input of a third part of convolution layer, the parameter of the convolution layer is 3x3x128x128, the output data after convolution is processed by a RELU activation function, and the output size is 16x16x128 feature map;
4) the fourth partial structure only comprises a convolution layer, and the part has no pooling layer; the characteristic diagram obtained in the step 3) is used as the input of the fourth part of the convolutional layer, the parameter of the convolutional layer is 3x3x128x64, the output data after convolution is processed by a RELU activation function, and the output size of the characteristic diagram is 16x16x 64;
5) the fifth part structure consists of a convolution layer and a pooling layer; the characteristic diagram obtained in the step 4) is used as the input of the fifth part of the convolutional layer, the parameter of the convolutional layer is 3x3x64x64, the output data after convolution is processed by a RELU activation function and then processed by the maximum pooling layer with the step length of 2, and the characteristic diagram with the size of 8x8x64 is output;
6) a sixth partial structure comprising only the convolutional layer, the portion being free of the pooling layer; the characteristic diagram obtained in the step 5) is used as the input of the sixth part of the convolutional layer, the parameter of the convolutional layer is 3x3x64x64, the output data after convolution is processed by a RELU activation function, and the output size is 8x8x 64;
7) the seventh part structure is a flattening layer, which has the function of smoothing the feature map with the size of 8x8x64 into a vector with the dimension of 4096x 1;
8) the eighth part is an output layer consisting of a convolution layer with convolution layer parameters of 1x1x4096x2 and a softmax activation function, and the part is the output layer of the whole full convolution matching network and is used for outputting the probability of image matching;
the convolution layer parameters in the steps 1) to 6) and 8) adopt an Xavier initialization mode, and the model training adopts a minimized cross entropy loss function E as an optimization target, which is defined as follows:
wherein the content of the first and second substances,xifor the ith value of the model output vector, N is the dimension number of the model output vector, i, j belongs to [0, N ∈]And e represents a natural number,probability value, y, representing the corresponding category of the i-th output vectoriA tag value representing an i-th dimension output vector.
Step two, obtaining image data of a training model according to the full convolution image matching network model constructed in the step one; the method comprises the following substeps:
1) feature points such as sift of the image are detected, and image blocks with the size of 64x64 are taken as training data by taking the pixel position of the sift feature point as the center. In order to obtain matching information of image blocks of different images, a traditional image matching method is required to be used for pre-matching the features in advance to obtain a correct label. This process requires manual verification to ensure the correctness of the label.
2) After the training data and the corresponding labels are obtained, in order to enhance the robustness of the algorithm, the training data needs to be enhanced to obtain final image data. The enhancement treatment specifically comprises the following steps: and exchanging the channel sequence of the two images, rotating one image by 90 degrees and 180 degrees, and adding random noise in the image.
And step three, on GPU equipment, training the full convolution image matching network model of the first step by adopting the image data of the second step to obtain a trained model.
Matching image block pairs in the images to be matched by adopting the trained model; the method comprises the following substeps:
1) detecting sift characteristic points of the matched image, and intercepting image blocks with the size of 64x64 by taking the characteristic points as centers;
2) selecting a plurality of image blocks with the size of 64x64 from the two images to be matched, stacking the image blocks two by two to obtain an image block pair to be predicted, and inputting the image block pair to be predicted into a trained full-convolution image matching network to obtain the matching result of the two images to be matched. If the first picture obtains n image blocks in total and the second picture obtains m image blocks in total, the number of stacked images to be predicted is n × m.
The invention has the beneficial effects that: the method effectively solves the problem that the matching accuracy and the matching efficiency cannot be achieved simultaneously, and has more advantages compared with the traditional method.
Drawings
FIG. 1 is a flow chart of the algorithm of the present invention;
FIG. 2 is a diagram of a full convolution matching network in accordance with the present invention;
table 1 is a table of parameters for a full convolution matching network in the present invention.
Detailed Description
The invention is described in further detail below with reference to the following figures and detailed description:
the image matching method based on the full convolution network comprises the following steps:
the method comprises the following steps: designing a full convolution image matching network, and setting model initialization parameters and a loss function;
constructing a full convolution image matching network model, wherein the model comprises 8 parts: the first part comprises a convolutional layer and a pooling layer, an image formed by stacking 2-channel image blocks is input, the size of the image is 64x64, the parameter of the convolutional layer is 3x3x2, the number of convolutions is 64, the convolved data is processed by a RELU activation function, the size of an output characteristic diagram is 64x64x64, and then the characteristic diagram is processed by a maximum pooling layer with the step size parameter of 2 to obtain a characteristic diagram of 32x32x 64;
the structure of the second part of the full convolution matching network model is similar to that of the first part, the convolution parameter is 3x3x64x128, convolution is carried out, then the RELU activation function is carried out to process convolution output, the output feature map is 32x32x128, and then the feature map is subjected to the maximum pooling layer with the step length of 2 to obtain a feature map with the size of 16x16x 128;
the structure of the third part of the full convolution matching network model is slightly different from the structures of the first part and the second part, the part does not have a pooling layer, the parameter of the convolution layer of the part is 3x3x128x128, the activation function is a RELU activation function, and the size of the feature map output after the third part of processing is 16x16x 128;
the fourth part of the full convolution matching network has the structure that the convolution parameter is 3x3x128x64, the activation function is the RELU activation function, and the size of the output feature map is 16x16x 64;
the fifth part of the structure of the full convolution matching network is that the convolution parameter is 3x3x64x64, the activation function is a RELU activation function, the size of the output feature map is 8x8x64 after the maximum pooling layer processing with the step length of 2;
the sixth part of the structure of the full convolution matching network is that the convolution parameter is 3x3x64x64, the activation function is the RELU activation function, and the size of the output feature map is 8x8x 64;
the seventh partial structure of the full convolution matching network is a flattening layer, and the function of the layer is to smooth the feature map with the size of 8x8x64 into a vector with the dimension of 4096x 1;
the eighth part of the full convolution matching network is an output layer consisting of a convolution layer with the parameter of 1x1x4096x2 and a softmax activation function, and the eighth part is the output layer of the whole full convolution matching network and outputs the probability of image matching;
2) initializing each convolutional layer parameter in the network by adopting an Xavier initialization mode, training the model by taking a minimized cross entropy loss function as an optimization target, wherein the minimized cross entropy loss function is defined as follows:
whereinxiThe ith value of the model output vector is represented, and N is the dimension number of the model output vector.
Step two: preparing image data of a training model according to the designed full convolution network model;
1) detecting the feature points such as sift, taking the pixel position of the feature point of sift as the center, taking the image block with the size of 64x64 as a training set, in order to obtain the matching information of the image blocks of different images, pre-matching the features by using a traditional image matching method in advance, manually verifying after matching, and ensuring the correctness of the label. For matching pairs of image blocks, we stack them as one 2-channel image and number the images, with the label that should have the same number as the image, 1 if the pair matches, and 0 otherwise.
2) After the training data and the corresponding labels are obtained, in order to enhance the robustness of the algorithm, the data needs to be subjected to light enhancement processing, and the method mainly performs the following light enhancement processing on the data: and exchanging the channel sequence of the two images, rotating one image by 90 degrees and 180 degrees, and adding random noise in the image.
Step three: on GPU equipment, training a full convolution matching network model by using prepared image data;
1) training the model by adopting a stochastic gradient descent optimization algorithm, setting the initial learning rate to be 0.1, setting the learning rate of the model to be reduced by 10 times every 10000 times of iteration, setting the maximum iteration number to be 30000, and setting the number of single iteration input images to be 128.
Matching image block pairs in the images to be matched by adopting the trained model; the method comprises the following substeps:
1) detecting sift characteristic points of the matched image, and intercepting image blocks with the size of 64x64 by taking the characteristic points as centers;
2) selecting a plurality of image blocks with the size of 64x64 from the two images to be matched, stacking the image blocks two by two to obtain an image block pair to be predicted, and inputting the image block pair to be predicted into a trained full-convolution image matching network to obtain the matching result of the two images to be matched. For example, if the first picture obtains n image blocks in total and the second picture obtains m image blocks in total, the number of stacked images to be predicted is n × m.
TABLE 1
The above-mentioned embodiments only express the embodiments of the present invention, but not should be understood as the limitation of the scope of the invention patent, it should be noted that, for those skilled in the art, many variations and modifications can be made without departing from the concept of the present invention, and these all fall into the protection scope of the present invention.
Claims (2)
1. An image matching method based on a full convolution network is characterized by comprising the following steps:
step one, constructing a full convolution image matching network model for image matching, setting model initialization parameters and a loss function, wherein the whole full convolution matching network model consists of 8 parts and comprises the following substeps:
1) the first part comprises a convolutional layer and a pooling layer, an image formed by stacking 2-channel image blocks is input, the size of the image is 64x64, the parameter of the convolutional layer is 3x3x2, the number of convolutions is 64, and after the convolved output data is processed by a RELU activation function, a feature map with the size of 64x64x64 is output; the feature map is processed by a maximum pooling layer with a step parameter of 2, and a feature map with the size of 32x32x64 is output;
2) the second part of structure is composed of a convolutional layer and a pooling layer, the characteristic diagram obtained in the step 1) is used as the input of the second part of convolutional layer, the parameter of the convolutional layer is 3x3x64x128, the output data after convolution is processed by a RELU activation function, and the output size is 32x32x 12; the feature map passes through a maximum pooling layer with the step size of 2, and a feature map with the size of 16x16x128 is output;
3) a third partial structure comprising only the convolutional layer, this part being free of pooling layers; the feature map obtained in the step 2) is used as the input of a third part of convolution layer, the parameter of the convolution layer is 3x3x128x128, the output data after convolution is processed by a RELU activation function, and the output size is 16x16x128 feature map;
4) the fourth partial structure only comprises a convolution layer, and the part has no pooling layer; the characteristic diagram obtained in the step 3) is used as the input of the fourth part of the convolutional layer, the parameter of the convolutional layer is 3x3x128x64, the output data after convolution is processed by a RELU activation function, and the output size of the characteristic diagram is 16x16x 64;
5) the fifth part structure consists of a convolution layer and a pooling layer; the characteristic diagram obtained in the step 4) is used as the input of the fifth part of the convolutional layer, the parameter of the convolutional layer is 3x3x64x64, the output data after convolution is processed by a RELU activation function and then processed by the maximum pooling layer with the step length of 2, and the characteristic diagram with the size of 8x8x64 is output;
6) a sixth partial structure comprising only the convolutional layer, the portion being free of the pooling layer; the characteristic diagram obtained in the step 5) is used as the input of the sixth part of the convolutional layer, the parameter of the convolutional layer is 3x3x64x64, the output data after convolution is processed by a RELU activation function, and the output size is 8x8x 64;
7) the seventh part structure is a flattening layer, which has the function of smoothing the feature map with the size of 8x8x64 into a vector with the dimension of 4096x 1;
8) the eighth part is an output layer consisting of a convolution layer with convolution layer parameters of 1x1x4096x2 and a softmax activation function, and the part is the output layer of the whole full convolution matching network and is used for outputting the probability of image matching;
step two, obtaining image data of a training model according to the full convolution image matching network model constructed in the step one; the method comprises the following substeps:
1) detecting the sift characteristic points of the image, and taking the pixel position of the sift characteristic points as the center, and taking the image blocks with the size of 64x64 as training data; in order to obtain matching information of image blocks of different images, the features need to be pre-matched by using a traditional image matching method in advance to obtain correct labels; the process needs manual verification to ensure the correctness of the label;
2) after the training data and the corresponding labels are obtained, in order to enhance the robustness of the algorithm, the training data needs to be enhanced to obtain final image data; the enhancement treatment specifically comprises the following steps: exchanging the channel sequence of the two images, rotating one image by 90 degrees and 180 degrees, and adding random noise in the image;
step three, on GPU equipment, training the full convolution image matching network model of the first step by adopting image data of the second step to obtain a trained model;
matching image block pairs in the images to be matched by adopting the trained model; the method comprises the following substeps:
1) detecting sift characteristic points of the matched image, and intercepting image blocks with the size of 64x64 by taking the characteristic points as centers;
2) selecting a plurality of image blocks with the size of 64x64 from the two images to be matched, stacking the image blocks two by two to obtain an image block pair to be predicted, and inputting the image block pair to be predicted into a trained full-convolution image matching network to obtain the matching result of the two images to be matched.
2. The image matching method based on the full convolution network according to claim 1, wherein the convolution layer parameters in the first step from step 1) to step 6) and in the step 8) are initialized by Xavier, and the model training uses a minimum cross entropy loss function E as an optimization target, which is defined as follows:
wherein the content of the first and second substances,xifor the ith value of the model output vector, N is the dimension number of the model output vector, i, j belongs to [0, N ∈]And e represents a natural number,probability value, y, representing the corresponding category of the i-th output vectoriA tag value representing an i-th dimension output vector.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910154179.9A CN109934272B (en) | 2019-03-01 | 2019-03-01 | Image matching method based on full convolution network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910154179.9A CN109934272B (en) | 2019-03-01 | 2019-03-01 | Image matching method based on full convolution network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109934272A CN109934272A (en) | 2019-06-25 |
CN109934272B true CN109934272B (en) | 2022-03-29 |
Family
ID=66986239
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910154179.9A Expired - Fee Related CN109934272B (en) | 2019-03-01 | 2019-03-01 | Image matching method based on full convolution network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109934272B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111199558A (en) * | 2019-12-25 | 2020-05-26 | 北京自行者科技有限公司 | Image matching method based on deep learning |
CN111724424B (en) * | 2020-06-24 | 2024-05-14 | 上海应用技术大学 | Image registration method |
CN111812732B (en) * | 2020-06-29 | 2024-03-15 | 中铁二院工程集团有限责任公司 | Geoelectromagnetic nonlinear inversion method based on convolutional neural network |
CN111951319A (en) * | 2020-08-21 | 2020-11-17 | 清华大学深圳国际研究生院 | Image stereo matching method |
CN113128518B (en) * | 2021-03-30 | 2023-04-07 | 西安理工大学 | Sift mismatch detection method based on twin convolution network and feature mixing |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108009524A (en) * | 2017-12-25 | 2018-05-08 | 西北工业大学 | A kind of method for detecting lane lines based on full convolutional network |
CN108960258A (en) * | 2018-07-06 | 2018-12-07 | 江苏迪伦智能科技有限公司 | A kind of template matching method based on self study depth characteristic |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10204299B2 (en) * | 2015-11-04 | 2019-02-12 | Nec Corporation | Unsupervised matching in fine-grained datasets for single-view object reconstruction |
-
2019
- 2019-03-01 CN CN201910154179.9A patent/CN109934272B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108009524A (en) * | 2017-12-25 | 2018-05-08 | 西北工业大学 | A kind of method for detecting lane lines based on full convolutional network |
CN108960258A (en) * | 2018-07-06 | 2018-12-07 | 江苏迪伦智能科技有限公司 | A kind of template matching method based on self study depth characteristic |
Non-Patent Citations (2)
Title |
---|
A Deep Metric for Multimodal Registration;Martin Simonovsky etal.;《http:arXiv:1609.05396v1》;20160917;全文 * |
卫星影像匹配的深度卷积神经网络方法;范大昭等;《测绘学报》;20180630;第47卷(第6期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN109934272A (en) | 2019-06-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109934272B (en) | Image matching method based on full convolution network | |
CN108171701B (en) | Significance detection method based on U network and counterstudy | |
CN111753828B (en) | Natural scene horizontal character detection method based on deep convolutional neural network | |
AU2020104423A4 (en) | Multi-View Three-Dimensional Model Retrieval Method Based on Non-Local Graph Convolutional Network | |
CN112288011B (en) | Image matching method based on self-attention deep neural network | |
CN104036012B (en) | Dictionary learning, vision bag of words feature extracting method and searching system | |
CN111898621B (en) | Contour shape recognition method | |
CN110751027B (en) | Pedestrian re-identification method based on deep multi-instance learning | |
CN104778476A (en) | Image classification method | |
CN110766708A (en) | Image comparison method based on contour similarity | |
Tripathi et al. | Real time object detection using CNN | |
Salem et al. | Semantic image inpainting using self-learning encoder-decoder and adversarial loss | |
CN113255604B (en) | Pedestrian re-identification method, device, equipment and medium based on deep learning network | |
CN111523586A (en) | Noise-aware-based full-network supervision target detection method | |
CN106951501B (en) | Three-dimensional model retrieval method based on multi-graph matching | |
Zhang et al. | Consecutive convolutional activations for scene character recognition | |
CN111597367A (en) | Three-dimensional model retrieval method based on view and Hash algorithm | |
CN111062274A (en) | Context-aware embedded crowd counting method, system, medium, and electronic device | |
CN111144469A (en) | End-to-end multi-sequence text recognition method based on multi-dimensional correlation time sequence classification neural network | |
CN110738194A (en) | three-dimensional object identification method based on point cloud ordered coding | |
CN107358200B (en) | Multi-camera non-overlapping vision field pedestrian matching method based on sparse learning | |
CN110555462A (en) | non-fixed multi-character verification code identification method based on convolutional neural network | |
CN113160291B (en) | Change detection method based on image registration | |
CN113011506B (en) | Texture image classification method based on deep fractal spectrum network | |
CN115063831A (en) | High-performance pedestrian retrieval and re-identification method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20220329 |
|
CF01 | Termination of patent right due to non-payment of annual fee |