CN114494372A

CN114494372A - Remote sensing image registration method based on unsupervised deep learning

Info

Publication number: CN114494372A
Application number: CN202210026370.7A
Authority: CN
Inventors: 叶沅鑫; 唐腾峰; 朱柏; 张家诚; 喻智睿
Original assignee: Southwest Jiaotong University
Current assignee: Southwest Jiaotong University
Priority date: 2022-01-11
Filing date: 2022-01-11
Publication date: 2022-05-13
Anticipated expiration: 2042-01-11
Also published as: CN114494372B

Abstract

The invention discloses a remote sensing image registration method based on unsupervised deep learning, which converts image registration into a regression optimization problem and can integrate a feature extraction network, image similarity measurement and feature descriptors of various forms and parameters. The depth features of the image to be registered are extracted by using a model network on multiple scales, geometric transformation parameters are obtained through parameter regression, and the image is geometrically corrected by using the parameters, so that the multi-scale step-by-step registration of the image from coarse to fine is realized. According to the method, a registration truth value is not needed to be used as a training sample, loss functions on multiple scales are jointly trained by constructing the loss functions based on similarity measurement and feature descriptors between images, parameters of each model network are updated through back propagation, geometric transformation parameters are optimized, and high-precision and high-robustness multi-source remote sensing image registration is achieved.

Description

Remote sensing image registration method based on unsupervised deep learning

Technical Field

The invention belongs to the technical field of remote sensing, and particularly relates to a design of a remote sensing image registration method based on unsupervised deep learning.

Background

With the rapid development of aerospace and remote sensing technologies, the means for acquiring remote sensing images are increasing and the types are abundant. Due to the difference between the equipment technology and the imaging mechanism of various sensors, the remote sensing image of a single data source cannot reflect the characteristics of ground objects comprehensively. In order to fully utilize multi-source remote sensing data acquired by different types of sensors and realize integration and information complementation, multi-source remote sensing images need to be registered.

The multi-source remote sensing image registration refers to a process of aligning and information superposition of multi-sensor remote sensing images of the same area acquired at different time, different visual angles or different sensor conditions, so that the same-name points on the aligned images have the same geographic coordinates. In the prior art, methods for multi-source remote sensing image registration comprise a traditional method without adopting a deep learning technology and a deep learning-based method. Traditional methods are based on feature or region templates, which rely on manually designed features, which usually need to be redesigned, for the registration of remote sensing images of different modalities of different sensors. The deep learning-based method extracts deep features from the multi-source remote sensing image, and has better universality compared with manual features. In the existing stage, a method based on supervised deep learning needs a large number of samples with truth labels as training data, but the existing stage remote sensing field does not have a large number of data for training, and the practical application of the method is limited by cost factors.

Disclosure of Invention

The invention aims to solve the problem that a large number of training samples are difficult to obtain by the existing remote sensing image registration method based on supervised deep learning, provides a remote sensing image registration method based on unsupervised deep learning, and can realize accurate registration between remote sensing images under the condition of no training samples.

The technical scheme of the invention is as follows: a remote sensing image registration method based on unsupervised deep learning comprises the following steps:

and S1, establishing a multi-source remote sensing image registration data set comprising two sets of image data, wherein every two images of the two sets of image data correspond to each other one by one, one set of image data is used as a reference image data set, and the other set of image data is used as an image data set to be corrected.

S2, selecting a reference image f from the reference image data set, selecting an image m to be corrected corresponding to the reference image f from the image data set to be corrected, and taking the reference image f and the image m to be corrected as end-to-end input on a training sample.

S3, calculating the transformation parameters mu of the image on the model network of each scale on 3 scales₁、μ₂、μ₃Gradually correcting the image m to be corrected to generate a corrected image m₁、m₂、m₃Propagating back the loss function of the model network of each scale and correcting the image m₃And a transformation parameter mu₃As an end-to-end output on one training sample.

And S4, initializing model network parameters of 3 scales respectively.

And S5, performing joint training on the model networks of 3 scales in an end-to-end mode, and optimizing a joint loss function on the 3 scales.

S6, searching the direction of the fastest reduction of the combined loss function value through the deep learning optimizer, carrying out back propagation on the model network according to the direction, iteratively updating model network parameters, storing the network model parameters when the combined loss function is reduced to a preset threshold value and converged, and outputting a registered reference image f and a registered correction image m₃。

Further, step S3 includes the following substeps:

s3-1, inputting the reference image f and the image m to be corrected into the model network of the 1 st scale to obtain the transformation parameter mu of the 1 st scale₁。

S3-2, adopting transformation parameter mu₁Geometric correction is carried out on the image m to be corrected to generate a corrected image m₁。

And S3-3, calculating the loss function of the model network of the 1 st scale.

S3-4, correcting the reference image f and the correction image m₁Inputting the residual error into a model network of 2 nd scale to obtain the residual error delta mu of the transformation parameter₁And then it is combined with the transformation parameter mu₁Combine to obtain the 2 ndTransformation parameter mu of scale₂。

S3-5, adopting transformation parameter mu₂For the corrected image m₁Performing geometric correction to generate a corrected image m₂。

And S3-6, calculating the loss function of the model network of the 2 nd scale.

S3-7, correcting the reference image f and the corrected image m₂Inputting the residual error into a model network of 3 rd scale to obtain the residual error delta mu of the transformation parameter₂And then it is combined with the transformation parameter mu₂Combining to obtain the transformation parameter mu of the 3 rd scale₃。

S3-8, adopting transformation parameter mu₃For the corrected image m₂Performing geometric correction to generate a corrected image m₃。

And S3-9, calculating a loss function of the model network of the 3 rd scale.

S3-10, correcting the image m₃And a transformation parameter mu₃As an end-to-end output on one training sample.

Further, step S3-1 includes the following substeps:

s3-1-1, down-sampling the reference image f and the image m to be corrected to 1/4 of the original size, and stacking the two images generated after down-sampling in the channel direction to generate a stacked image.

S3-1-2, inputting the superposed image into a feature extraction part of the model network of the 1 st scale to generate depth features.

S3-1-3, passing the depth characteristics through a parameter regression part of a model network of the 1 st scale to obtain a transformation parameter mu of the 1 st scale₁。

Further, step S3-2 includes the following substeps:

s3-2-1, converting the parameter mu₁Form a geometric transformation matrix T_μ1。

S3-2-2, transforming the matrix T by geometry_μ1Performing geometric transformation on the image m to be corrected to generate a corrected image m₁。

Further, step S3-4 includes the following substeps:

s3-4-1, correcting the reference image f and the corrected image m₁Down-sampled to 1/2 of the original size, respectively, and the two images generated after down-sampling are superimposed in the channel direction to generate a superimposed image.

And S3-4-2, inputting the superposed image into a feature extraction part of the model network of the 2 nd scale to generate depth features.

S3-4-3, passing the depth characteristics through a parameter regression part of a model network with the 2 nd scale to obtain the residual error delta mu of the transformation parameters₁。

S3-4-4, and converting residual error delta mu₁And a transformation parameter mu₁Combining to obtain the transformation parameter mu of the 2 nd scale₂。

Further, step S3-5 includes the following substeps:

s3-5-1, converting the parameter mu₂Form a geometric transformation matrix T_μ2。

S3-5-2, transforming the matrix T through geometry_μ2For the corrected image m₁Performing geometric transformation to generate a corrected image m₂。

Further, step S3-7 includes the following substeps:

s3-7-1, correcting the reference image f and the corrected image m₂The superimposition is performed in the channel direction, resulting in a superimposed image.

And S3-7-2, inputting the superposed image into a feature extraction part of the model network with the 3 rd scale to generate depth features.

S3-7-3, passing the depth characteristics through a parameter regression part of a model network with a 3 rd scale to obtain residual errors delta mu of transformation parameters₂。

S3-7-4, and converting residual error delta mu₂And a transformation parameter mu₂Combining to obtain the transformation parameter mu of the 3 rd scale₃。

Further, step S3-8 includes the following substeps:

s3-8-1, converting the parameter mu₃Form a geometric transformation matrix T_μ3。

S3-8-2, transforming the matrix T through geometry_μ3For the corrected image m₂Performing geometric transformation to generateCorrecting image m₃。

Further, the Loss function Loss of the model network of the 1 st scale in step S3-3_sim(f，m，μ₁) Comprises the following steps:

loss function Loss of model network of 2 nd scale in step S3-6_sim(f，m₁，μ₂) Comprises the following steps:

loss function Loss of model network of 3 rd scale in step S3-9_sim(f，m₂，μ₃) Comprises the following steps:

the joint Loss function Loss in step S5 is:

Loss＝λ₁×Loss_sim(f，m，μ₁)+λ₂×Loss_sim(f，m₁，μ₂)+λ₃×Loss_sim(f，m₂，μ₃)

where Sim (-) denotes a similarity measure, λ₁，λ₂，λ₃And the weight factor is a loss function of each scale model network.

Further, step S4 includes the following substeps:

s4-1 to minimize Loss function Loss_sim(f，m，μ₁) And training the model network of the 1 st scale.

S4-2, fixing the parameters of the model network of the 1 st scale to minimize the Loss function Loss_sim(f，m₁，μ₂) And training the model network of the 2 nd scale.

S4-3、Fixing parameters of the model network of the 1 st scale and the model network of the 2 nd scale to minimize the Loss function Loss_sim(f，m₂，μ₃) And training the model network of the 3 rd scale.

The invention has the beneficial effects that:

(1) the method converts the image registration into a regression optimization problem, can integrate the feature extraction network, the image similarity measure and the feature descriptor in various forms and parameters, and realizes the multi-scale image accurate registration of end-to-end mapping without supervision learning.

(2) The depth features of the image to be registered are extracted by using a model network on multiple scales, geometric transformation parameters are obtained through parameter regression, and the image is geometrically corrected by using the parameters, so that the multi-scale step-by-step registration of the image from coarse to fine is realized.

(3) According to the method, a registration truth value is not needed to be used as a training sample, loss functions on multiple scales are jointly trained by constructing the loss functions based on similarity measurement and feature descriptors between images, parameters of each model network are updated through back propagation, geometric transformation parameters are optimized, and high-precision and high-robustness multi-source remote sensing image registration is achieved.

Drawings

Fig. 1 is a flowchart of a remote sensing image registration method based on unsupervised deep learning according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of a reference image, an image to be corrected, and a corrected image according to an embodiment of the invention.

Fig. 3 is a schematic general framework diagram of a remote sensing image registration method according to an embodiment of the present invention.

Fig. 4 is a schematic structural diagram of a model network 1 according to an embodiment of the present invention.

Fig. 5 is a schematic diagram illustrating calculation of similarity measurement of multi-source remote sensing images according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It is to be understood that the embodiments shown and described in the drawings are merely exemplary and are intended to illustrate the principles and spirit of the invention, not to limit the scope of the invention.

The embodiment of the invention provides a remote sensing image registration method based on unsupervised deep learning, which comprises the following steps of S1-S6 as shown in FIG. 1:

In the embodiment of the present invention, the image to be corrected in the image data set to be corrected should be an image with geometric distortion and a certain range of overlap (greater than or equal to 70% in the embodiment of the present invention) with the feature information included in the reference image.

In an embodiment of the present invention, step S1 is further described by taking the registration of an optical image with a Synthetic Aperture Radar (SAR) image as an example. As shown in fig. 2, in the embodiment of the present invention, an image a with a fixed resolution is used as a reference image, an image b overlapping with a partial area of the image a and having geometric distortion is used as an image to be corrected, and after registration and correction by the registration method provided by the present invention, an image c aligned with the overlapping area of the image a pixel by pixel is obtained. The multi-source remote sensing image data set comprises a plurality of pairs of regional images similar to the image a and the image b. It should be understood that other embodiments of the present invention include, but are not limited to, registration of multi-source optical images, registration of optical images with infrared images, registration of optical images with LiDAR (Light Detection and Ranging) intensity and elevation images, and registration of optical images with grid maps, and it is within the scope of the present invention to employ the registration methods provided herein.

S3, calculating shadows on 3 scales respectivelyTransformation parameters mu like on model networks of various scales₁、μ₂、μ₃Gradually correcting the image m to be corrected to generate a corrected image m₁、m₂、m₃Propagating back the loss function of the model network of each scale and correcting the image m₃And a transformation parameter mu₃As an end-to-end output on one training sample.

The embodiment of the invention adopts a multi-scale matching strategy from coarse to fine, jointly trains model networks on 3 scales by an end-to-end frame, and predicts transformation parameters and residual errors thereof, thereby realizing accurate registration of images. The end-to-end frame refers to the embodiment of the invention that the reference image f and the image m to be corrected are input, and the corrected image m is output₃And a transformation parameter mu₃Which constitutes an end-to-end mapping relationship.

As shown in FIG. 3, step S3 includes the following substeps S3-1 through S3-10:

s3-1, inputting the reference image f and the image m to be corrected into the model network (the model network 1) of the 1 st scale (the scale 1 is short in the embodiment of the invention) to obtain the transformation parameter mu of the 1 st scale₁。

Step S3-1 includes the following substeps S3-1-1 to S3-1-3:

In the embodiment of the present invention, the size of the reference image f is fixed, and if the size of the image m to be corrected is not consistent with that of the reference image f, the size of the image m to be corrected is adjusted to be consistent with that of the reference image f by adopting a zero padding or cropping method.

In one embodiment of the invention, as shown in fig. 4, the feature extraction portion of the model network 1 is composed of k sets of interconnected convolution blocks and downsampling layers, each convolution block including one convolution blockConvolution layer, a partial response normalization layer and a linear cell activation function layer, each down-sampling layer reducing the image resolution to 1/2. Experiments show that the size of a feature map generated by the last convolution block is enabled to be [4, 7 ] through reasonably selecting the value of k]In addition, when the number of convolution kernel channels of the convolution layer is set to 1/4 of the size of the image to be corrected, it is beneficial to generate more accurate transformation parameter mu in the subsequent steps₁. In the embodiment of the present invention, if the size of the reference image f and the image m to be corrected is 512 × 512, the size of the image 1/4 after downsampling is 128 × 128, the number of convolution kernel channels per each convolution block is set to 32, and the value of k is set to 5, and the feature map size generated by 5 sets of convolution blocks and downsampling layers of the stacked image is 4.

In another embodiment of the present invention, the feature extraction part of the model network 1 includes, but is not limited to, a U-shaped structure network (U-Net), a full convolution neural network (FCN), and the like.

As shown in fig. 4, in an embodiment of the present invention, the parameter regression portion of the model network 1 is composed of t fully connected layers connected in parallel, and the value of t can be set by integrating the computation speed and the range of image scaling, which is not limited by the present invention. Experiments prove that if the scaling coefficient is 0.5 and 2, the effect is better when 4 parallel full-connection layers are set. The parallel fully-connected layers are similar to the pyramid strategy used in conventional image registration, with the difference being that the initial values of the output spatial transformation parameters differ in scale. The computation of multiple parallel fully-connected layers can greatly accelerate the convergence of the loss function compared to using a single fully-connected layer output parameter.

It should be understood that the invention is not limited to the implementation of the feature extraction part and the parameter regression part of the model Network 1, and the idea of extracting the depth feature in the channel direction and outputting the geometric transformation parameter through the Convolutional Neural Network (CNN) of various forms and parameters by using the input method of the overlay image is within the protection effect of the invention.

Step S3-2 includes the following substeps S3-2-1 to S3-2-2:

s3-2-1, transforming the data by the transformation parameter mu₁Form a geometric transformation matrix T_μ1。

In one embodiment of the present invention, as shown in FIG. 4, 6 geometric transformation parameters a are output in step S3-1-3₁，a₂，a₃，a₄，a₅，a₆I.e. forming a two-dimensional affine matrix T_μ1：

The 6 parameters in the affine transformation matrix represent the operations of translation, rotation, scaling and miscut on the image pixel coordinates. It is assumed that the geometric transformation of the image includes: the amount of translation in the x direction is D_xTranslation in the y-direction of D_x(ii) a The scaling factor in the x-direction is S_xThe scaling factor in the y-direction is S_y(ii) a Rotating the angle theta clockwise; the angle of miscut in the x direction is

The miscut angle in the y direction is ω, then the two-dimensional affine matrix T_μ1The 6 parameters are obtained by random permutation and combination of the operations:

in an embodiment of the present invention, the parameter regression portion of the model network 1 outputs a greater or lesser number of geometric transformation parameters to form other geometric transformation matrices than affine transformation, such as perspective transformation, rigid transformation, etc., which is not limited by the present invention.

S3-2-2, transforming the matrix T by geometry_μ1Performing geometric transformation on the image m to be corrected to generate a corrected image m₁：

m₁＝T_μ1(m)

Specifically, each pixel with coordinates (X, Y) on the image m to be corrected is set to have a gray value σ, the coordinates (X, Y) on the corrected image after spatial transformation are calculated, and the corrected image m is generated according to a certain resampling and interpolation method₁. In an embodiment of affine transformation, there are:

s3-3, calculating Loss function Loss of model network of 1 st scale_sim(f，m，μ₁)：

Wherein,

represents T_μ1The inverse geometric transform of (a), which is defined as:

sim (-) represents a similarity measure, i.e. Sim (a, B) represents some measure of similarity of computed images a and B. Common similarity measure calculation methods include Sum of Squared Differences (SSD), Normalized Cross-Correlation (NCC), and Phase Correlation (Phase Correlation), among others:

wherein the size of image A and image B is w x w,

and

are the mean gray levels of image a and image B, respectively.

Calculating the conventional similarity measure (such as SSD or NCC) is time consuming, and according to the correlation or convolution of two images in the spatial domain being equal to the product thereof in the frequency domain, the phase correlation with faster calculation speed is adopted, and the specific steps are as follows:

let image A and image B have a displacement relation (x) in the spatial domain₀，y₀) I.e. B (x, y) ═ A (x-x)₀，y-y₀) Respectively denoted as F by Fourier transform_A(u, v) and F_B(u, v) in the frequency domain, there is the following relationship:

F_B(u，v)＝F_A(u，v)exp(-i(ux₀+vy₀))

the normalized cross-power spectrum of both is expressed as:

where superscript denotes complex conjugation.

In an embodiment of the present invention, the image a and the image B are multisource optical remote sensing images obtained by the same sensor in the same area, and the gray value is used as an input for calculating the similarity measure of the image a and the image B.

In another embodiment of the present invention, the images a and B are remote sensing images obtained from different types of sensors (such as optical, infrared, SAR, etc.) in the same area, and instead of directly using gray values as input for calculating the similarity measure of the images a and B, Local Feature descriptors of the images a and B, such as directional Gradient Feature Channel (CFOG), directional Histogram of Oriented Gradient (HOG), Local Self-similarity Descriptor (LSS), and Phase-consistent Phase Histogram of oriented Histogram (HOPC), are calculated on a pixel-by-pixel basis. As shown in fig. 5, SSD, NCC or phase correlation between feature descriptor images of two images is used as a similarity measure.

Steps S3-1 to S3-3 are to generate transformation parameters and corrected images at scale 1 and to calculate the loss function, and these steps describe the specific implementation of the correlation operation in detail. The subsequent steps (steps S3-4 to S3-9) will also repeat similar operations on other scales, which are only different in parameters from the related operations on scale 1, and the flow thereof will be briefly summarized without repeating the detailed description of the principle.

S3-4, correcting the reference image f and the correction image m₁Inputting the data into a model network (model network 2) of the 2 nd scale (the scale 2 is abbreviated in the embodiment of the invention) to obtain the residual error delta mu of the transformation parameter₁And then it is combined with the transformation parameter mu₁Combining to obtain the transformation parameter mu of the 2 nd scale₂。

Step S3-4 includes the following substeps S3-4-1 to S3-4-4:

In the embodiment of the present invention, the network structure of the model network 2 is similar to the network structure of the model network 1, and only the parameter setting is different. The specific implementation of the feature extraction in step S3-4-2 will be further described with reference to specific embodiments, if the reference image f and the correction image m are₁Is 512 × 512, 1/2 is 256 × 256, the number of convolution kernel channels per convolution block is set to 64, and the value of k is set to 6, and the stack is madeThe image produced a feature size of 4 via 6 sets of rolling blocks and downsampling layers.

S3-4-4, and converting residual error delta mu₁And a transformation parameter mu₁Combining to obtain the transformation parameter mu of the 2 nd scale₂：

μ₂＝μ₁*Δμ₁

Where denotes multiplication of the matrix.

Step S3-5 includes the following substeps S3-5-1 to S3-5-2:

S3-5-2, transforming the matrix T through geometry_μ2For the corrected image m₁Performing geometric transformation to generate a corrected image m₂：

m₂＝T_μ2(m₁)

S3-6, calculating Loss function Loss of model network of 2 nd scale_sim(f，m₁，μ₂)：

S3-7, correcting the reference image f and the correction image m₂Inputting the data into a model network (model network 3) of 3 rd scale (scale 3 in the embodiment of the invention) to obtain the residual error delta mu of the transformation parameter₂And then it is combined with the transformation parameter mu₂Combining to obtain the transformation parameter mu of the 3 rd scale₃。

Step S3-7 includes the following substeps S3-7-1 to S3-7-4:

s3-7-1, correcting the reference image f and the corrected image m₂Stacking in the direction of the passageAnd generating a superposed image.

In the embodiment of the present invention, the network structure of the model network 3 is similar to the network structures of the model network 1 and the model network 2, and only the parameter setting is different. To further explain the specific implementation of the feature extraction in step S3-7-2 with reference to the specific embodiment, if the sizes of the image f and the image m are 512 × 512, the number of convolution kernel channels of each convolution block is set to 128, and the value of k is set to 7, and the feature map size generated by stacking the images through 7 sets of convolution blocks and downsampling layers is 4.

S3-7-4, and converting residual error delta mu₂And a transformation parameter mu₂Combining to obtain the transformation parameter mu of the 3 rd scale₃：

μ₃＝μ₂*Δμ₂

Where denotes multiplication of the matrix.

Step S3-8 includes the following substeps S3-8-1 to S3-8-2:

S3-8-2, transforming the matrix T through geometry_μ3For the corrected image m₂Performing geometric transformation to generate a corrected image m₃：

m₃＝T_μ3(m₂)

S3-9, calculating Loss function Loss of model network of 3 rd scale_sim(f，m₂，μ₃)：

And S4, initializing model network parameters of 3 scales respectively.

Step S4 includes the following substeps S4-1 to S4-3:

S4-3, fixing parameters of the model network of the 1 st scale and the model network of the 2 nd scale to minimize Loss function Loss_sim(f，m₂，μ₃) And training the model network of the 3 rd scale.

In the embodiment of the invention, before the joint training of the model networks with 3 scales, the parameters of all the model networks need to be released.

In the embodiment of the invention, the Loss function Loss is as follows:

wherein λ₁，λ₂，λ₃Is the weight factor of the loss function of each scale model network, in the embodiment of the invention, lambda₁，λ₂，λ₃The values of (A) are 0.05, 0.05 and 0.9 respectively.

S6, searching the direction with the fastest reduction of the joint loss function value through the deep learning optimizer, carrying out back propagation on the model network in the direction, and iteratively updating the parameters of the model networkWhen the combined loss function is reduced to a preset threshold value and converged, all model networks mapped end to end have overall optimal parameters, namely a reference image f and a correction image m₃The method has the best similarity, saves the parameters of the network model at the moment, and outputs a reference image f and a correction image m after registration₃。

Therefore, the method realizes the accurate registration of the multi-scale remote sensing image which is completely unsupervised to learn and mapped end to end.

It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Those skilled in the art, having the benefit of this disclosure, may effect numerous modifications thereto and changes may be made without departing from the scope of the invention in its aspects.

Claims

1. A remote sensing image registration method based on unsupervised deep learning is characterized by comprising the following steps:

s1, establishing a multi-source remote sensing image registration data set comprising two groups of image data, wherein every two images of the two groups of image data correspond to each other one by one, one group of image data is used as a reference image data set, and the other group of image data is used as an image data set to be corrected;

s2, selecting a reference image f from the reference image dataset, selecting an image m to be corrected corresponding to the reference image f from the image dataset to be corrected, and taking the reference image f and the image m to be corrected as end-to-end input on a training sample;

s3, calculating the transformation parameters mu of the image on the model network of each scale on 3 scales₁、μ₂、μ₃Gradually correcting the image m to be corrected to generate a corrected image m₁、m₂、m₃Propagating back the loss function of the model network of each scale and correcting the image m₃And a transformation parameter mu₃As in a trainingAn end-to-end output on the sample;

s4, initializing model network parameters of 3 scales respectively;

s5, performing joint training on the model networks of 3 scales in an end-to-end mode, and optimizing joint loss functions on the 3 scales;

2. The remote sensing image registration method according to claim 1, wherein the step S3 includes the following substeps:

s3-1, inputting the reference image f and the image m to be corrected into the model network of the 1 st scale to obtain the transformation parameter mu of the 1 st scale₁；

S3-2, adopting transformation parameter mu₁Geometric correction is carried out on the image m to be corrected to generate a corrected image m₁；

S3-3, calculating a loss function of the model network of the 1 st scale;

s3-4, correcting the reference image f and the correction image m₁Inputting the residual error into a model network of 2 nd scale to obtain the residual error delta mu of the transformation parameter₁And then it is combined with the transformation parameter mu₁Combining to obtain the transformation parameter mu of the 2 nd scale₂；

S3-5, adopting transformation parameter mu₂For the corrected image m₁Performing geometric correction to generate a corrected image m₂；

S3-6, calculating a loss function of the model network of the 2 nd scale;

s3-7, correcting the reference image f and the corrected image m₂Inputting the residual error into a model network of 3 rd scale to obtain the residual error delta mu of the transformation parameter₂And then it is combined with the transformation parameter mu₂Combining to obtain the transformation parameter mu of the 3 rd scale₃；

S3-8, adopting transformation parameter mu₃For the corrected image m₂Performing geometric correction to generate a corrected image m₃；

S3-9, calculating a loss function of the model network of the 3 rd scale;

3. The remote sensing image registration method according to claim 2, wherein the step S3-1 comprises the following substeps:

s3-1-1, respectively down-sampling the reference image f and the image m to be corrected to 1/4 of the original size, and overlapping two images generated after down-sampling in the channel direction to generate an overlapped image;

s3-1-2, inputting the superposed image into a feature extraction part of a model network with the 1 st scale to generate depth features;

4. The remote sensing image registration method according to claim 2, wherein the step S3-2 comprises the following substeps:

s3-2-1, converting the parameter mu₁Form a geometric transformation matrix T_μ1；

5. The remote sensing image registration method according to claim 2, wherein the step S3-4 comprises the following substeps:

s3-4-1, correcting the reference image f and the corrected image m₁Respectively down-sampling to 1/2 of the original size, and overlapping two images generated after down-sampling in the channel direction to generate an overlapped image;

s3-4-2, inputting the superposed image into a feature extraction part of a model network with a 2 nd scale to generate depth features;

s3-4-3, passing the depth characteristics through a parameter regression part of a model network with the 2 nd scale to obtain the residual error delta mu of the transformation parameters₁；

6. The remote sensing image registration method according to claim 2, wherein the step S3-5 comprises the following sub-steps:

s3-5-1, converting the parameter mu₂Forming a geometric transformation matrix T_μ2；

7. The remote sensing image registration method according to claim 2, wherein the step S3-7 comprises the following substeps:

s3-7-1, correcting the reference image f and the corrected image m₂Superposing the images in the channel direction to generate superposed images;

s3-7-2, inputting the superposed image into a feature extraction part of a model network with a 3 rd scale to generate depth features;

s3-7-3, passing the depth characteristics through a parameter regression part of a model network with a 3 rd scale to obtain residual errors delta mu of transformation parameters₂；

8. The remote sensing image registration method according to claim 2, wherein the step S3-8 comprises the following substeps:

s3-8-1, converting the parameter mu₃Form a geometric transformation matrix T_μ3；

S3-8-2, by geometric transformationMatrix T_μ3For the corrected image m₂Performing geometric transformation to generate a corrected image m₃。

9. The method for remotely sensing image registration according to claim 1, wherein the Loss function Loss of model network of scale 1 in step S3-3_sim(f，m，μ₁) Comprises the following steps:

the Loss function Loss of the model network of the 2 nd scale in the step S3-6_sim(f，m₁，μ₂) Comprises the following steps:

the Loss function Loss of the model network of the 3 rd scale in the step S3-9_sim(f，m₂，μ₃) Comprises the following steps:

the joint Loss function Loss in step S5 is:

10. The method for remotely sensing image registration according to claim 9, wherein said step S4 includes the sub-steps of:

S4-1、to minimize the Loss function Loss_sim(f，m，μ₁) Training the model network of the 1 st scale;

s4-2, fixing the parameters of the model network of the 1 st scale to minimize the Loss function Loss_sim(f，m₁，μ₂) Training the model network of the 2 nd scale;

s4-3, fixing parameters of the model network of the 1 st scale and the model network of the 2 nd scale to minimize the Loss function Loss_sim(f，m₂，μ₃) And training the model network of the 3 rd scale.