CN113326856B - Self-adaptive two-stage feature point matching method based on matching difficulty - Google Patents

Self-adaptive two-stage feature point matching method based on matching difficulty Download PDF

Info

Publication number
CN113326856B
CN113326856B CN202110884790.4A CN202110884790A CN113326856B CN 113326856 B CN113326856 B CN 113326856B CN 202110884790 A CN202110884790 A CN 202110884790A CN 113326856 B CN113326856 B CN 113326856B
Authority
CN
China
Prior art keywords
picture
matching
feature point
layer
descriptor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110884790.4A
Other languages
Chinese (zh)
Other versions
CN113326856A (en
Inventor
周军
黄坤
刘野
李静远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202110884790.4A priority Critical patent/CN113326856B/en
Publication of CN113326856A publication Critical patent/CN113326856A/en
Application granted granted Critical
Publication of CN113326856B publication Critical patent/CN113326856B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Probability & Statistics with Applications (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a self-adaptive two-stage feature point matching method based on matching difficulty, and belongs to the technical field of image processing. The technical scheme of the invention is as follows: firstly, selecting a subsequent specific matching mode based on the picture difference degree between the picture pairs, and if the picture difference degree is smaller, directly performing matching processing based on the Euclidean distance between descriptors of the feature points; otherwise, performing dimension increasing processing on the position information of the feature points of each picture to enable the dimension of the feature points to be consistent with that of the descriptor, adding the descriptor and the position information after dimension increasing to obtain a new descriptor of each feature point, performing attention aggregation processing to obtain a matching descriptor of each feature point, and performing matching processing based on the inner product between the matching descriptors. The method is used for matching the feature points of the image pair, realizes self-adaptive two-stage feature point matching, and improves the matching accuracy and the processing efficiency.

Description

Self-adaptive two-stage feature point matching method based on matching difficulty
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a self-adaptive two-stage feature point matching method based on matching difficulty.
Background
The feature point matching refers to how to correctly match feature point sets of different pictures after the feature point sets are extracted from the pictures. There are important applications in computer vision tasks based on geometry. For example, the simultaneous positioning and mapping technology is a key technology of unmanned driving, feature point extraction and matching are carried out on the picture pairs shot by the camera, and the position information of the robot or the vehicle at each moment can be calculated according to the matching relation. The good matching result is not only beneficial to improving the accuracy of the subsequently calculated position, but also can provide a good initial value for the subsequent iterative algorithm and help the iterative algorithm to reach the optimal result as soon as possible.
The early stage of feature point matching is mainly using the nearest neighbor matching algorithm. When the feature point extraction is completed, a vector describing the surrounding information is obtained, and the vector is called a descriptor. And the nearest neighbor matching algorithm calculates Euclidean distances between descriptors as a measurement standard, and the feature points with smaller distances are more likely to be matched.
Although nearest neighbor matching performs well in some simple environments, it is difficult to achieve satisfactory results when matching under difficult scenes (e.g., blur, occlusion, large perspective transformation). So in recent years, with the excellent performance of neural networks in image processing, matching using neural networks has begun to be performed.
Although the matching task is completed by a neural network algorithm, great progress is made in accuracy. But with the attendant increase in computational effort, this makes it difficult to apply the feature point matching task to real-time applications, such as simultaneous localization and mapping.
Disclosure of Invention
The embodiment of the invention provides a self-adaptive two-stage feature point matching method based on matching difficulty, which is used for improving the accuracy of picture feature point matching.
The embodiment of the invention provides a self-adaptive two-stage feature point matching method based on matching difficulty, which comprises the following steps:
step 1, inputting a picture pair to be matched, wherein the picture information of the input picture pair comprises brightness information of a picture and characteristic point information of the picture, the characteristic point information comprises position information and a descriptor, and the position information comprises a spatial position coordinate and a confidence value of a characteristic point;
step 2, calculating the picture difference degree between the picture pairs to be matched based on the brightness information of the pictures, if the picture difference degree is greater than or equal to a difference threshold value, executing the steps 3 to 5, otherwise executing the step 6;
step 3, performing dimension increasing processing on the position information of the feature points of each picture to enable the dimension of the position information of the feature points after dimension increasing to be consistent with the dimension of the descriptor, and adding the descriptor and the position information after dimension increasing to obtain a new descriptor of each feature point;
step 4, carrying out attention aggregation processing on the new descriptor of each feature point to obtain a matching descriptor of each feature point;
step 5, calculating a matching result in an inner product mode:
respectively defining two pictures of the picture pair as a first picture and a second picture;
traversing each feature point of the first picture, calculating the matching degree between each current feature point of the first picture and the matching descriptor of each feature point in the second picture by adopting an inner product mode, and if the maximum matching degree is larger than or equal to a first matching threshold value, taking the feature point of the second picture corresponding to the maximum matching degree as the matching result of the current feature point of the first picture;
step 6, calculating a matching result in an Euclidean distance mode:
respectively defining two pictures of the picture pair as a first picture and a second picture;
and traversing each feature point of the first picture, calculating the matching degree between each current feature point of the first picture and the descriptor of each feature point in the second picture by adopting an Euclidean distance mode, and if the minimum matching degree is less than or equal to a second matching threshold, taking the feature point of the second picture corresponding to the minimum matching degree as the matching result of the current feature point of the first picture.
In the embodiment of the invention, a subsequent specific matching mode is selected for the picture difference degree between the picture pairs, and if the picture difference degree is smaller, namely smaller than a specified difference threshold value, matching processing is directly carried out on the basis of a descriptor of the feature point; otherwise, after a series of calculation processing, obtaining a matching descriptor which can more characterize the feature point, adopting an inner product operation result between the two matching descriptors as a calculation value of the matching degree of the matching descriptor, and taking an object which is the largest and is larger than a first matching degree threshold value as a final matching result so as to improve the matching accuracy. In other words, in the embodiment of the present invention, adaptive two-stage feature point matching is implemented based on the disparity detection in the first stage and the feature point aggregation in the second stage (step 3) and the attention aggregation (step 4), so as to improve the matching accuracy and processing efficiency.
Further, in step 2, calculating the picture difference between the picture pairs to be matched based on the brightness information of the pictures as follows: and carrying out size normalization processing on the two pictures of the picture pair, and calculating the difference degree of the pictures according to the absolute error sum of the brightness information.
Further, in step 3, the way of performing the dimension-increasing processing on the position information of the feature point of each picture is as follows: and performing dimension-increasing processing on the position information of the feature points of each picture through a multilayer perceptron.
Further, in step 3, the multilayer perceptron is: defining L to represent the number of network layers of the multilayer perceptron, wherein the first L-1 layer is a stacking structure of L-1 first stacking blocks, and the L-th layer is an addition layer; the input of the 1 st first convolution block is position information of all feature points of a picture, the channel number of a feature map output by the L-1 st first convolution block is the same as the dimension of a descriptor, the input of the addition layer is the descriptor of all feature points of the picture and the output feature map of the L-1 st first convolution block, and the first convolution block comprises a convolution layer, a batch processing layer and an activation function ReLu layer which are sequentially connected.
Further, in step 4, the attention aggregation processing on the new descriptor of each feature point is as follows:
by using LGThe graph network of layers performs an attention-aggregation process on the new descriptors of feature points, where LGIs an odd number greater than 1;
front face L of the graph networkG-1 layer is a staggered layer structure of self layer and cross layer, the last layer is a full connection layer;
the self layer and the cross layer have the same network structure and are both neural network layers with attention mechanisms, the input of the self layer is different feature points of the same picture of the picture pair to be matched, and the input of the cross layer is different feature points of the two pictures of the picture pair to be matched.
Further, in step 4, the top L of the graph networkG-the network structure of each of the 1 layers is two stacked second convolutional blocks, the number of input channels of the 1 st second convolutional block of each layer is 2M and the number of output channels is 2M according to the forward propagation direction, and the convolution kernel size of the 1 st second convolutional block is 1 × 2 mx 2M; the number of input channels of the 2 nd second convolution block of each layer is 2M, the number of output channels is M, the convolution kernel size of the 2 nd second convolution block is 1 × 2M × M, wherein M represents the dimension of a descriptor, and the second convolution block comprises convolution layers and an activation function ReLu layer which are connected in sequence.
Further, in step 5, the value range of the first matching threshold is set to 9-11.
Further, in step 6, the value range of the second matching threshold is set to be 0.8-1.
The technical scheme provided by the embodiment of the invention at least has the following beneficial effects:
(1) according to the embodiment of the invention, the matching mode can be flexibly selected according to the matching difficulty degree by adding the picture difference degree detection, so that the speed is increased as much as possible while the accuracy is ensured. Compared with the traditional neural network processing scheme, the speed is obviously improved in a mixed environment (complex environment and simple environment).
(2) Compared with the traditional simple matching processing scheme, the embodiment of the invention uses the two-stage neural network, and ensures that higher matching accuracy can be achieved under complex conditions.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow chart of an adaptive two-stage feature point matching method based on matching difficulty according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a judgment module adopted in a two-stage adaptive feature point matching method based on matching difficulty according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an aggregation module used in a matching difficulty-based adaptive two-stage feature point matching method according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
Feature point matching plays a very important role in many geometry-based computer vision tasks, such as three-dimensional reconstruction, simultaneous localization and mapping. The embodiment of the invention provides a feature point matching method capable of adaptively starting a two-stage neural network according to the matching difficulty degree, and compared with the traditional matching scheme, the matching accuracy is greatly improved in various environments. And under a simple environment, compared with a matching scheme based on a neural network, the speed is greatly improved.
Referring to fig. 1, an adaptive two-stage feature point matching method based on matching difficulty provided in an embodiment of the present invention includes the following steps:
step 1, inputting a picture pair to be matched, wherein the input picture information comprises brightness information of a picture and characteristic point information of the picture, and the characteristic point information comprises position information (space position coordinates and confidence values) and a descriptor;
step 2, detecting the picture difference: calculating the picture difference degree between the picture pairs to be matched based on the brightness information of the pictures, if the picture difference degree is larger than or equal to a difference threshold value, executing the steps 3 to 5, otherwise executing the step 6;
step 3, key point polymerization treatment: performing dimension increasing processing on the position information of the feature points of each picture to enable the dimension of the position information of the feature points after dimension increasing to be consistent with the dimension of the descriptor, and adding the descriptor and the position information after dimension increasing to obtain a new descriptor of each feature point;
step 4, attention polymerization post-treatment: carrying out attention aggregation processing on the new descriptor of each feature point to obtain a matching descriptor of each feature point;
step 5, calculating a matching result in an inner product mode:
respectively defining two pictures of the picture pair as a first picture and a second picture;
traversing each feature point of the first picture, calculating the matching degree between each current feature point of the first picture and the matching descriptor of each feature point in the second picture by adopting an inner product mode, and if the maximum matching degree is larger than or equal to a first matching threshold value, taking the feature point of the second picture corresponding to the maximum matching degree as the matching result of the current feature point of the first picture;
step 6, calculating a matching result in an Euclidean distance mode:
respectively defining two pictures of the picture pair as a first picture and a second picture;
and traversing each feature point of the first picture, calculating the matching degree between each current feature point of the first picture and the descriptor of each feature point in the second picture by adopting an Euclidean distance mode, and if the minimum matching degree is less than or equal to a second matching threshold, taking the feature point of the second picture corresponding to the minimum matching degree as the matching result of the current feature point of the first picture.
In the embodiment of the present invention, the relevant processing in steps 2 to 4 is implemented by using a neural network, and the overall processing process in the embodiment of the present invention may be divided into two parts: the device comprises a judging module and a two-stage module, wherein the two-stage module specifically comprises: the device comprises a first-stage convergence module and a second-stage calculation module, wherein the second-stage calculation module comprises an inner product calculation module and a Euclidean distance calculation module.
Two pictures to be matched (a pair of pictures to be matched) are firstly subjected to picture difference degree detection through a judgment module, and if the picture difference degree reaches a specified difference threshold value, a first-stage polymerization module and a second-stage inner product calculation module are started; otherwise, only the Euclidean distance calculation module of the second stage is started, and finally the matching result is obtained based on the output result of the calculation module of the second stage.
In a possible implementation manner, the judging module is mainly used for judging the difficulty degree of matching. The matching difficulty degree is mainly determined by the difference between the pictures, when the difference between the two pictures is small, the feature points do not change greatly, and the matching difficulty is small. However, when the difference between the two pictures is large, for example, the illumination is changed violently, the vision is changed greatly, the blur appears, the occlusion appears, the feature point changes greatly, and the matching difficulty is large at this time. Referring to fig. 2, the judging module includes a normalization module (reshape module) and a difference judging module (sad of Absolute differences module). Namely, the judging module firstly utilizes a reshape module to change two pictures into the same size, namely, the size normalization processing is carried out, and then the absolute error Sum Algorithm (SAD) is utilized to judge the picture difference. Therefore, the difference degree of the pictures can be simply judged, the parallelism degree on a GPU (image processor) is high, a large amount of time cannot be wasted, and a specific calculation formula is as follows.
For the picture pairs to be matched, respectively recording as picture IAAnd IBAnd is defined asA(i,j)、IB(I, j) respectively represent pictures IAAnd IBThe pixel brightness value at pixel point (i, j). Picture IAAnd IBAfter the normalization module, the size is normalized to W x H, W represents the width, H represents the height, and Dscore is defined to represent the difference degree of the two pictures, which can be calculated by formula (1).
Figure 891044DEST_PATH_IMAGE001
(1)
Referring to fig. 3, in one possible implementation, the aggregation module includes two parts, a keypoint aggregation and an attention aggregation.
Defining the number of feature points of picture A as NAFeature point number of picture B is NBWherein the feature points are obtained based on a feature point extraction process on the imageAny conventional method may be used. And definition of di AAnd di BThe descriptors respectively represent the ith feature point input by the picture a and the picture B, the dimension of the descriptor is assumed to be set, and in this embodiment, the dimension of the descriptor is 256. p is a radical ofi AAnd pi BThe position information of the ith feature point input by the picture a and the picture B is respectively represented, the dimension is 3, and the spatial coordinates (x, y) and the confidence coefficient of the feature point are represented.
Figure 765066DEST_PATH_IMAGE002
All the description subsets representing the picture a input,
Figure 137141DEST_PATH_IMAGE003
all the description subsets representing the picture B input. Wherein
Figure 270182DEST_PATH_IMAGE004
The key point aggregation is to fuse the position information of the feature points with descriptors, perform dimensionality enhancement on the position information of the feature points through a multi-layer perceptron (MLP), and add the dimensionality enhancement information with the descriptors to obtain new descriptors, wherein the new descriptors are used for later attention aggregation. Definition of
Figure 58010DEST_PATH_IMAGE005
And
Figure 461572DEST_PATH_IMAGE006
respectively representing the ith new descriptor obtained by polymerizing the picture A and the picture B through the key points, wherein the calculation formula can be characterized as follows:
Figure 738969DEST_PATH_IMAGE007
(2)
wherein,
Figure 93727DEST_PATH_IMAGE008
representing the output of a multi-layered perceptron, i.e. pairsAnd (5) a dimension-raising processing result of the position information of each feature point of the picture.
In one possible implementation, the network structure of the multi-layer perceptron used for acquiring the new descriptor is shown in table 1, where N represents the number of feature points, and N is NAOr NB
Figure 809879DEST_PATH_IMAGE009
Namely, in the embodiment of the present invention, the network structure of the multilayer perceptron adopted in the key point aggregation is: definition L represents the number of network layers of the multilayer perceptron, wherein the front L-1 layer is a stacked structure of L-1 first volume blocks, and the L-th layer is an addition layer, and the first volume blocks comprise a volume layer (Convlutionled), a batch layer (BatchNormld) and an activation function ReLu layer which are connected in sequence. The input of the multilayer perceptron is all feature points of one picture, the number (dimension) of channels is 3, namely the spatial position information and the confidence coefficient of the feature points; the output is all the feature points, the number of channels is M, and the value of M is the same as the dimension number of the descriptor. In the stacking structure of the L-1 first stacking blocks, the number of output channels of the front L-2 layers is increased layer by layer until the number of output channels of the L-2 th stacking block is M. And the addition layer is used for adding the position information after the dimension is increased and the descriptor to obtain a new descriptor.
As a preferred structure, the number of the rolling blocks is set to 5, wherein the number of output channels from the 1 st layer to the 5 th layer is: 32. 64, 128, 256.
The attention aggregation is to better aggregate information, and in the embodiment of the present invention, the overall network architecture adopts a graph network. I.e. each descriptor is used as a node to aggregate the information obtained by the attention mechanism.
As a preferred structure, the graph network is provided in 19 layers in total, and odd layers among the previous 18 layers are provided as self layers, and even layers are provided as cross layers. Layer 19 is a fully connected layer called the Final layer.
The self and cross refer to an introduced self-cross mechanism, and similar to human repeated alignment, the self-cross and cross can determine the subject of attention mechanism aggregation. When the surrounding information is aggregated previously, only the picture is left. In fact, there is also a need to aggregate information of the face-to-face pictures. That is, the aggregation object of self layer comes from the picture, the aggregation object of cross layer comes from the opposite picture of the picture pair, and the two layers appear alternately.
That is, in the embodiment of the present invention, the number of layers of the graph network is defined as LGFront face L ofG1 layers of alternating layers of self and cross layers, i.e. LGThe value of-1 is an even number and the last layer is a fully connected layer.
The following mathematical expression is given for the calculation of each layer:
definition of
Figure 59159DEST_PATH_IMAGE010
The first to show picture AlThe ith descriptor of a layer is described,
Figure 507457DEST_PATH_IMAGE011
the second of which represents picture BlI description of the layer, wherein
Figure 287195DEST_PATH_IMAGE012
. And the output after the key point polymerization is taken as the 0 th layer, namely
Figure 479141DEST_PATH_IMAGE013
And
Figure 886989DEST_PATH_IMAGE014
is the output after the key point polymerization.
Definition of
Figure 538813DEST_PATH_IMAGE015
And
Figure 602583DEST_PATH_IMAGE016
respectively show that the picture A and the picture B are in the second placelThe set of all descriptors of a layer.
Figure 863801DEST_PATH_IMAGE017
The first to show picture AlAggregation information of the ith descriptor of a layer,
Figure 1521DEST_PATH_IMAGE018
the second of which represents picture BlAggregation information of the ith descriptor of a layer.
Updating operation is carried out based on the descriptor of the previous layer to obtain the descriptor of the current layer, and the specific calculation formula is shown as a formula (3):
Figure 57202DEST_PATH_IMAGE019
(3)
wherein,
Figure 372383DEST_PATH_IMAGE020
represents the upper layer of (lLayer 1), the symbol "II" represents a stitch, i.e. a stitch in the channel direction,
Figure 968449DEST_PATH_IMAGE021
the output of the self layer or cross layer, i.e. the layer output of the graph network, is represented. The self layer and the cross layer have the same network structure but different inputs, and the network structure comprises two stacked second volume blocks, wherein the number of input channels of the first second volume block is 2M, the number of output channels of the first second volume block is 2M, and the convolution kernel size of the first second volume block is 1 multiplied by 2M; the number of input channels of the second convolution block is 2M, the number of output channels is M, and the convolution kernel size of the second layer is 1 × 2 mxm. For the case of a descriptor dimension of 256, the network structure parameters of the self layer or cross layer are shown in table 2:
Figure 554151DEST_PATH_IMAGE022
in table 2, in the expression form k1 × k2 × k3 of the convolution kernel size, k1 × k2 indicates the convolution kernel shape, and k3 indicates the number of output channels.
Aggregating information
Figure 780733DEST_PATH_IMAGE023
And
Figure 320561DEST_PATH_IMAGE024
in the embodiment of the invention, an attention mechanism is adopted, and a self-cross mechanism is used for controlling the aggregation object.
The attention mechanism is for computing aggregated information and is similar to database queries. Define the second of picture AlThe input set of layers is
Figure 595685DEST_PATH_IMAGE025
And
Figure 35894DEST_PATH_IMAGE026
set of
Figure 433377DEST_PATH_IMAGE027
The number of feature points in is
Figure 490195DEST_PATH_IMAGE028
Set of
Figure 135720DEST_PATH_IMAGE029
The number of feature points in is
Figure 696015DEST_PATH_IMAGE030
And the second definition of picture BlThe input set of layers is
Figure 264399DEST_PATH_IMAGE031
And
Figure 542934DEST_PATH_IMAGE032
set of
Figure 989221DEST_PATH_IMAGE033
The number of feature points in is
Figure 731918DEST_PATH_IMAGE034
Set of
Figure 471204DEST_PATH_IMAGE035
The number of feature points in is
Figure 63466DEST_PATH_IMAGE036
. For the ith feature point of picture A, aggregate information is calculated
Figure 749662DEST_PATH_IMAGE037
. Firstly, the query sequence is calculated by formula (4)
Figure 284549DEST_PATH_IMAGE038
Index of
Figure 961780DEST_PATH_IMAGE039
Sum value
Figure 542803DEST_PATH_IMAGE040
Figure 298269DEST_PATH_IMAGE041
(4)
Wherein,
Figure 203776DEST_PATH_IMAGE042
to represent
Figure 284864DEST_PATH_IMAGE043
The ith descriptor in (1), represents
Figure 25287DEST_PATH_IMAGE044
In
Figure 679384DEST_PATH_IMAGE045
The description of the jth descriptor,
Figure 923284DEST_PATH_IMAGE046
is shown aslThe parameters of the layer linear mapping, i.e. the weights and the biases.
Similarly, the attention of the picture B can be obtainedQuery column, index and value of the mechanism:
Figure 440853DEST_PATH_IMAGE047
and is and
Figure 75097DEST_PATH_IMAGE048
Figure 703524DEST_PATH_IMAGE049
Figure 300465DEST_PATH_IMAGE050
wherein
Figure 988936DEST_PATH_IMAGE051
representation collection
Figure 172792DEST_PATH_IMAGE052
The description of the ith sub-group of (1),
Figure 339331DEST_PATH_IMAGE053
in a representation set
Figure 59288DEST_PATH_IMAGE054
The jth descriptor of (1).
In self layer:
for the picture a, it is shown that,
Figure 715397DEST_PATH_IMAGE055
for the picture B, the picture B is,
Figure 120971DEST_PATH_IMAGE056
in the cross layer:
for the picture a, it is shown that,
Figure 28884DEST_PATH_IMAGE057
for the picture B, the picture B is,
Figure 101882DEST_PATH_IMAGE058
then the information is aggregated
Figure 636550DEST_PATH_IMAGE059
Is equal to
Figure 794998DEST_PATH_IMAGE060
And
Figure 37761DEST_PATH_IMAGE061
the difference is the weight, the value
Figure 230845DEST_PATH_IMAGE062
See equation (5).
Figure 933484DEST_PATH_IMAGE063
(5)
Wherein,
Figure 844808DEST_PATH_IMAGE064
and
Figure 156841DEST_PATH_IMAGE065
the fourth to represent respectively picture A and picture B
Figure 204431DEST_PATH_IMAGE066
A weight between the ith element of the query of the layer and the jth element of the index, an
Figure 668517DEST_PATH_IMAGE067
Figure 270400DEST_PATH_IMAGE068
Softmax () represents a normalized exponential function, and the superscript "T" represents transposition.
Finally, when passing the preceding LG-after layer 1 has aggregated information, fully connected layeringRespectively carrying out linear mapping on the descriptors obtained after the aggregation of the picture A and the picture B to obtain matching descriptors (see formula (6), wherein fi AAnd fi BIs the final matching descriptor
Figure 386123DEST_PATH_IMAGE069
And
Figure 383161DEST_PATH_IMAGE070
. Then F is mixedAAnd FBAnd sending the mixture to the second stage for matching.
Figure 660558DEST_PATH_IMAGE071
(6)
Wherein x isi A、xj BRespectively represent the L < th >G1 layer descriptor (calculated according to equation (3)), superscript to distinguish different pictures, subscript to distinguish different feature points, WfAnd bfParameters representing the fully connected layer, i.e. weight and bias, respectively, of the fully connected layer.
The second stage calculation module performs matching based on the calculated matching score. The method is divided into two calculation modes, namely inner product and Euclidean distance calculation. In the formation of FAAnd FBThen, define
Figure 687420DEST_PATH_IMAGE072
,
Figure 341255DEST_PATH_IMAGE073
When the difference between the two pictures is greater than or equal to the specified difference threshold (namely the first stage is started), the inner product is selected to calculate the matching score, and the larger the difference is, the better the matching score is. For FAEach of fi ABoth according to formulas (7) and FBAll descriptors in (1) calculate a match score. For each fi AIn other words, if it is calculatedIs greater than the threshold1, the corresponding descriptor is fi AAnd obtaining the matching relation of the feature points. Such as f1 AAnd f3 BThe matching score between the feature points is the maximum and is greater than the threshold1, which indicates that the 1 st feature point in the picture a matches the 3 rd feature point in the picture B.
Figure 590534DEST_PATH_IMAGE074
(7)
Preferably, the value range of the threshold value threshold1 can be set to 9-11.
When the difference between the two pictures is smaller than a specified difference threshold (namely, the first stage is closed), the Euclidean distance is selected to calculate the matching score, and the smaller the difference is, the better the matching score is. Since there is no aggregation through the first stage network, the description subset D of the input is directly inputAAnd DBAnd (6) performing calculation. For DAEach of di ABoth according to formulas (8) and DBAll descriptors in (1) calculate a matching score. For each di AIn other words, if the calculated minimum match score is less than the threshold2, then the corresponding descriptor is di AAnd matching the descriptors to obtain the matching relationship of the feature points. Such as d2 AAnd d3 BThe matching score between the feature points is minimum and is less than threshold2, which indicates that the 2 nd feature point in picture a matches the 3 rd feature point in picture B.
Figure 304412DEST_PATH_IMAGE075
(8)
Preferably, the value range of the threshold value threshold2 can be set to 0.8-1.0.
The embodiment of the invention provides a self-adaptive two-stage feature point matching method based on matching difficulty, which is a task oriented to geometry in computer vision, can achieve high matching accuracy, can flexibly adjust a network to efficiently utilize computing resources in various environments, and has obviously improved speed compared with the traditional method of simply using a neural network mode. The embodiment of the invention can achieve high accuracy rate aiming at the matching of different environments, can self-adaptively adjust the network architecture aiming at different matching difficulties, efficiently utilizes computing resources, and has greatly improved speed compared with the traditional neural network.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
What has been described above are merely some embodiments of the present invention. It will be apparent to those skilled in the art that various changes and modifications can be made without departing from the inventive concept thereof, and these changes and modifications can be made without departing from the spirit and scope of the invention.

Claims (7)

1. A self-adaptive two-stage feature point matching method based on matching difficulty is characterized by comprising the following steps:
step 1, inputting a picture pair to be matched, wherein the input picture information comprises brightness information of a picture and characteristic point information of the picture, the characteristic point information comprises position information and a descriptor, and the position information comprises a spatial position coordinate and a confidence value of the characteristic point;
step 2, calculating the picture difference degree between the picture pairs to be matched based on the brightness information of the pictures, if the picture difference degree is greater than or equal to a difference threshold value, executing the steps 3 to 5, otherwise executing the step 6;
the picture difference degree is as follows: carrying out size normalization processing on two pictures of the picture pair, and calculating the difference degree of the pictures according to the absolute error of the brightness information;
step 3, performing dimension increasing processing on the position information of the feature points of each picture to enable the dimension of the position information of the feature points after dimension increasing to be consistent with the dimension of the descriptor, and adding the descriptor and the position information after dimension increasing to obtain a new descriptor of each feature point;
step 4, carrying out attention aggregation processing on the new descriptor of each feature point to obtain a matching descriptor of each feature point;
step 5, calculating a matching result in an inner product mode:
respectively defining two pictures of the picture pair as a first picture and a second picture;
traversing each feature point of the first picture, calculating the matching degree between each current feature point of the first picture and the matching descriptor of each feature point in the second picture by adopting an inner product mode, and if the maximum matching degree is larger than or equal to a first matching threshold value, taking the feature point of the second picture corresponding to the maximum matching degree as the matching result of the current feature point of the first picture;
step 6, calculating a matching result in an Euclidean distance mode:
respectively defining two pictures of the picture pair as a first picture and a second picture;
and traversing each feature point of the first picture, calculating the matching degree between each current feature point of the first picture and the descriptor of each feature point in the second picture by adopting an Euclidean distance mode, and if the minimum matching degree is less than or equal to a second matching threshold, taking the feature point of the second picture corresponding to the minimum matching degree as the matching result of the current feature point of the first picture.
2. The adaptive two-stage feature point matching method based on matching difficulty degree according to claim 1, wherein in step 3, the way of performing dimension-increasing processing on the position information of the feature points of each picture is as follows: and performing dimension-increasing processing on the position information of the feature points of each picture through a multilayer perceptron.
3. The adaptive two-stage feature point matching method based on matching difficulty degree according to claim 2, wherein in step 3, the multi-layer perceptron is:
defining L to represent the number of network layers of the multilayer perceptron, wherein the first L-1 layer is a stacking structure of L-1 first stacking blocks, and the L-th layer is an addition layer; the input of the 1 st first convolution block is position information of all feature points of a picture, the channel number of a feature map output by the L-1 st first convolution block is the same as the dimension of a descriptor, the input of the addition layer is the descriptor of all feature points of the picture and the output feature map of the L-1 st first convolution block, and the first convolution block comprises a convolution layer, a batch processing layer and an activation function ReLu layer which are sequentially connected.
4. The adaptive two-stage feature point matching method based on matching difficulty degree according to any one of claims 1 to 3, wherein in the step 4, the attention aggregation processing on the new descriptor of each feature point is as follows:
by using LGThe graph network of layers performs an attention-aggregation process on the new descriptors of feature points, where LGIs an odd number greater than 1;
front face L of the graph networkG-1 layer is a staggered layer structure of self layer and cross layer, the last layer is a full connection layer;
the self layer and the cross layer have the same network structure and are both neural network layers with attention mechanisms, the input of the self layer is different feature points of the same picture of the picture pair to be matched, and the input of the cross layer is different feature points of the two pictures of the picture pair to be matched.
5. The adaptive two-stage feature point matching method based on matching difficulty as claimed in claim 4, wherein in step 4, the top L of the graph networkG-the network structure of each of the 1 layers is two stacked second convolutional blocks, the number of input channels of the 1 st second convolutional block of each layer is 2M and the number of output channels is 2M according to the forward propagation direction, and the convolution kernel size of the 1 st second convolutional block is 1 × 2 mx 2M; each layer beingThe number of input channels of the 2 nd second convolution block is 2M, the number of output channels is M, the convolution kernel size of the 2 nd second convolution block is 1 × 2 mxm, wherein M represents the dimension of a descriptor, and the second convolution block includes a convolution layer and an activation function ReLu layer which are sequentially connected.
6. The adaptive two-stage feature point matching method based on matching difficulty degree according to claim 1, wherein in step 5, the value range of the first matching threshold is set to be 9-11.
7. The adaptive two-stage feature point matching method based on matching difficulty degree according to claim 1, wherein in step 6, the value range of the second matching threshold is set to be 0.8-1.
CN202110884790.4A 2021-08-03 2021-08-03 Self-adaptive two-stage feature point matching method based on matching difficulty Active CN113326856B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110884790.4A CN113326856B (en) 2021-08-03 2021-08-03 Self-adaptive two-stage feature point matching method based on matching difficulty

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110884790.4A CN113326856B (en) 2021-08-03 2021-08-03 Self-adaptive two-stage feature point matching method based on matching difficulty

Publications (2)

Publication Number Publication Date
CN113326856A CN113326856A (en) 2021-08-31
CN113326856B true CN113326856B (en) 2021-12-03

Family

ID=77426909

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110884790.4A Active CN113326856B (en) 2021-08-03 2021-08-03 Self-adaptive two-stage feature point matching method based on matching difficulty

Country Status (1)

Country Link
CN (1) CN113326856B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114612531B (en) * 2022-02-22 2024-07-16 腾讯科技(深圳)有限公司 Image processing method and device, electronic equipment and storage medium
CN117765084B (en) * 2024-02-21 2024-05-03 电子科技大学 Visual positioning method for iterative solution based on dynamic branch prediction

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019084804A1 (en) * 2017-10-31 2019-05-09 深圳市大疆创新科技有限公司 Visual odometry and implementation method therefor
CN113159043A (en) * 2021-04-01 2021-07-23 北京大学 Feature point matching method and system based on semantic information

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101251926B (en) * 2008-03-20 2011-08-17 北京航空航天大学 Remote sensing image registration method based on local configuration covariance matrix
CN101547359B (en) * 2009-04-17 2011-01-05 西安交通大学 Rapid motion estimation self-adaptive selection method based on motion complexity
CN102322864B (en) * 2011-07-29 2014-01-01 北京航空航天大学 Airborne optic robust scene matching navigation and positioning method
CN102592129B (en) * 2012-01-02 2013-10-16 西安电子科技大学 Scenario-driven image characteristic point selection method for smart phone
TWI486906B (en) * 2012-12-14 2015-06-01 Univ Nat Central Using Image Classification to Strengthen Image Matching
CN106358029B (en) * 2016-10-18 2019-05-03 北京字节跳动科技有限公司 A kind of method of video image processing and device
CN109934857B (en) * 2019-03-04 2021-03-19 大连理工大学 Loop detection method based on convolutional neural network and ORB characteristics
CN110246169B (en) * 2019-05-30 2021-03-26 华中科技大学 Gradient-based window adaptive stereo matching method and system
CN111814839B (en) * 2020-06-17 2023-09-01 合肥工业大学 Template matching method of longicorn group optimization algorithm based on self-adaptive variation
CN111767960A (en) * 2020-07-02 2020-10-13 中国矿业大学 Image matching method and system applied to image three-dimensional reconstruction
CN112734747B (en) * 2021-01-21 2024-06-25 腾讯科技(深圳)有限公司 Target detection method and device, electronic equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019084804A1 (en) * 2017-10-31 2019-05-09 深圳市大疆创新科技有限公司 Visual odometry and implementation method therefor
CN113159043A (en) * 2021-04-01 2021-07-23 北京大学 Feature point matching method and system based on semantic information

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Structure adaptive feature point matching for urban area wide-based line images with viewpoint variation";Chen M等;《Acta Geodaetica et Cartographica Sinica》;20191231;第1129-1140页 *
"基于环境差异度的自适应角点匹配算法";刘芳萍等;《数字视频》;20151231;第39卷(第1期);第24-31页 *

Also Published As

Publication number Publication date
CN113326856A (en) 2021-08-31

Similar Documents

Publication Publication Date Title
CN113326856B (en) Self-adaptive two-stage feature point matching method based on matching difficulty
CN107358626B (en) Method for generating confrontation network calculation parallax by using conditions
CN110220493B (en) Binocular distance measuring method and device
CN112435282B (en) Real-time binocular stereo matching method based on self-adaptive candidate parallax prediction network
He et al. A fully end-to-end cascaded cnn for facial landmark detection
WO2024021394A1 (en) Person re-identification method and apparatus for fusing global features with ladder-shaped local features
CN112258554A (en) Double-current hierarchical twin network target tracking method based on attention mechanism
CN111652899A (en) Video target segmentation method of space-time component diagram
CN113706581B (en) Target tracking method based on residual channel attention and multi-level classification regression
CN113222998B (en) Semi-supervised image semantic segmentation method and device based on self-supervised low-rank network
Ma et al. Flexible and generalized real photograph denoising exploiting dual meta attention
CN116188825A (en) Efficient feature matching method based on parallel attention mechanism
CN115641285A (en) Binocular vision stereo matching method based on dense multi-scale information fusion
CN115564983A (en) Target detection method and device, electronic equipment, storage medium and application thereof
Xie et al. Feature-guided spatial attention upsampling for real-time stereo matching network
WO2022197615A1 (en) Techniques for adaptive generation and visualization of quantized neural networks
CN114066844A (en) Pneumonia X-ray image analysis model and method based on attention superposition and feature fusion
CN109063834B (en) Neural network pruning method based on convolution characteristic response graph
Li et al. U-Match: Two-view Correspondence Learning with Hierarchy-aware Local Context Aggregation.
CN114005046A (en) Remote sensing scene classification method based on Gabor filter and covariance pooling
CN116824143A (en) Point cloud segmentation method based on bilateral feature fusion and vector self-attention
CN112464900A (en) Multi-template visual target tracking method based on twin network
CN116824232A (en) Data filling type deep neural network image classification model countermeasure training method
CN112925822B (en) Time series classification method, system, medium and device based on multi-representation learning
CN115272404A (en) Multi-target tracking method based on nuclear space and implicit space feature alignment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant