CN109214288B

CN109214288B - Inter-frame scene matching method and device based on multi-rotor unmanned aerial vehicle aerial video

Info

Publication number: CN109214288B
Application number: CN201810871516.1A
Authority: CN
Inventors: 沈伟; 李瑞程
Original assignee: Guangzhou Xinguangfei Information Technology Co ltd
Current assignee: Guangzhou Xinguangfei Information Technology Co ltd
Priority date: 2018-08-02
Filing date: 2018-08-02
Publication date: 2021-08-10
Anticipated expiration: 2038-08-02
Also published as: CN109214288A

Abstract

The invention discloses an interframe scene matching method and device based on multi-rotor unmanned aerial vehicle aerial video, wherein the method comprises the following steps: intercepting target area images from two video frames; extracting characteristic angle point pairs from the target area image by adopting a sparse optical flow method; screening out a target characteristic angle point pair combination according to the first geometric space distance of each characteristic angle point pair, the well-shaped region division of the target region image and the characteristic angle point pair combination; calculating a second geometric space distance between two corner points of the transformed target characteristic corner pairs according to the perspective transformation matrix, and further calculating the sum of weighted distances of all target characteristic corner pairs in each target characteristic corner pair combination; and taking the combined result of the target feature corner pair with the minimum weighted distance sum as a final matching result. The invention can effectively deal with the complex and changeable motion shooting state of the multi-rotor unmanned aerial vehicle imaging device without depending on flight data or attitude parameters of the multi-rotor unmanned aerial vehicle and the imaging device thereof.

Description

Inter-frame scene matching method and device based on multi-rotor unmanned aerial vehicle aerial video

Technical Field

The invention relates to the technical field of image processing, in particular to an inter-frame scene matching method and device based on multi-rotor unmanned aerial vehicle aerial video.

Background

Aerial photography is also called aerial photography or aerial photography, and is used for shooting and shooting the earth landform, urban landscape, people and the like in the air. At present, unmanned aerial vehicles are widely used in military affairs, traffic construction, hydraulic engineering, ecological research, television columns, artistic photography and the like.

However, in the research and practice process of the prior art, the inventor of the present invention finds that the existing unmanned aerial vehicle aerial photography scene matching method in the industry generally needs to perform three-dimensional data calculation such as aerial triangulation according to flight data or parameters of an imaging device, and the existing scene matching method that does not depend on parameters of the imaging device in the academic world assumes that scenes shot twice or more are shot by imaging devices with different shooting parameters and angles at the same spatial position. That is to say, prior art can not have the effect to the complicated changeable motion shooting state of many rotor unmanned aerial vehicle image device, seriously relies on many rotor unmanned aerial vehicle and image device's flight data or attitude parameter, and must restrict many rotor unmanned aerial vehicle's flight control deliberately.

Disclosure of Invention

The technical problem to be solved by the embodiment of the invention is to provide the inter-frame scene matching method and device based on the aerial video of the multi-rotor unmanned aerial vehicle, which can effectively cope with the complex and changeable motion shooting state of the multi-rotor unmanned aerial vehicle imaging device without depending on flight data or attitude parameters of the multi-rotor unmanned aerial vehicle and the imaging device thereof.

To solve the above problem, in one aspect, an embodiment of the present invention provides an inter-frame scene matching method based on an aerial video of a multi-rotor unmanned aerial vehicle, which is suitable for being executed in a computer device, and at least includes the following steps:

extracting two adjacent video frames from a video to be processed, and intercepting respective target area images after performing fisheye distortion correction on the two video frames;

utilizing a Harris angular point detector to detect characteristic angular points of a target area image of a previous video frame and extracting significant characteristic angular points;

detecting a new characteristic corner corresponding to the salient characteristic corner in a target area image of a subsequent video frame by adopting a sparse optical flow method according to the salient characteristic corner, and extracting the salient characteristic corner meeting the interframe scene matching condition and the corresponding new characteristic corner as a characteristic corner pair;

calculating a first geometric space distance of each characteristic angle point pair, drawing a histogram according to the geometric space distance, then performing region division on the histogram according to a peak value of the histogram, and screening out the characteristic angle point pairs in a region where the peak value of the histogram is smaller than a preset threshold value;

carrying out 'well' word region division on the target region image so as to combine all the remaining characteristic angle point pairs according to the region division result, and further screening out target characteristic angle point pair combinations meeting the interframe scene matching conditions;

calculating a corresponding perspective transformation matrix according to the basic parameters of the target characteristic angle pair combination, and calculating second geometric space distances between two angular points of all target characteristic angle pairs in each converted target characteristic angle pair combination according to the perspective transformation matrix;

after the quotient of the second geometric space distance and the first geometric space distance is obtained, calculating the sum of weighted distances of all target characteristic angle point pairs in each target characteristic angle point pair combination;

and taking the target characteristic corner point pair corresponding to the target characteristic corner point pair combination with the minimum sum of the weighted distances and the corresponding perspective transformation matrix as a final matching result.

Further, two adjacent video frames are spaced by no more than 30 frames:

further, after fisheye distortion correction is performed on the two video frames, respective target area images are intercepted, specifically:

carrying out fisheye distortion correction on the video frame through a parameter matrix of an imaging device to obtain a corrected image;

and transversely trisecting the corrected image, and intercepting the image in the middle area to obtain the image of the target area.

Further, carrying out 'well' word region division on the target region image so as to combine all the remaining characteristic angle point pairs according to the region division result, and further screening out target characteristic angle point pair combinations meeting the interframe scene matching conditions; the method specifically comprises the following steps:

equally dividing the target area image into a 'well' shape from two horizontal and longitudinal directions by using two lines, taking image areas at four corners of the 'well' shape as 4 corner block areas, and taking the rest image areas as interval areas;

extracting one characteristic angle point pair from each of the 4 angle block regions to obtain a target characteristic angle point pair combination consisting of 4 characteristic angle point pairs, and repeatedly extracting a plurality of target characteristic angle point pair combinations;

and screening the target characteristic angle pair combinations meeting the interframe scene matching conditions according to the geometric parameters of 4 characteristic angle pairs in each target characteristic angle pair combination.

Further, the geometric parameters of the 4 characteristic angle point pairs include: the area of a quadrangle formed by enclosing 4 characteristic angle points, the area of a 3-angle formed by enclosing any 3 characteristic angle points by 4 characteristic angle points and the geometric distance of any 2 point pairs by 4 characteristic angle points.

In another aspect, an embodiment of the present invention provides an inter-frame scene matching apparatus based on an aerial video of a multi-rotor drone, including:

the target area image intercepting module is used for extracting two adjacent video frames from the video to be processed, and intercepting respective target area images after fisheye distortion correction is carried out on the two video frames;

the salient feature corner extraction module is used for detecting feature corners of a target area image of a previous video frame by using a Harris corner detector and extracting salient feature corners;

the characteristic corner pair extraction module is used for detecting a new characteristic corner corresponding to the salient characteristic corner in a target area image of a subsequent video frame by adopting a sparse optical flow method according to the salient characteristic corner, and extracting the salient characteristic corner meeting the interframe scene matching condition and the corresponding new characteristic corner as a characteristic corner pair;

the characteristic angle point pair screening module is used for calculating a first geometric space distance of each characteristic angle point pair, drawing a histogram according to the geometric space distance, then carrying out region division on the histogram according to a peak value of the histogram, and screening out the characteristic angle point pairs in a region where the peak value of the histogram is smaller than a preset threshold value;

the target characteristic corner pair combination extraction module is used for carrying out 'well' character region division on the target region image so as to combine all the remaining characteristic corner pairs according to the region division result and further screen out target characteristic corner pair combinations meeting the interframe scene matching conditions;

the second geometric space distance operation module is used for calculating a corresponding perspective transformation matrix according to the basic parameters of the target characteristic angle pair combination and calculating second geometric space distances between two angular points of all target characteristic angle pairs in each converted target characteristic angle pair combination according to the perspective transformation matrix;

the matching module is used for calculating the sum of the weighted distances of all target characteristic angle point pairs in each target characteristic angle point pair combination after the quotient of the second geometric space distance and the first geometric space distance is obtained;

Further, the feature corner pair screening module is configured to calculate a first geometric space distance of each feature corner pair, draw a histogram according to the geometric space distance, perform region division on the histogram according to a peak value of the histogram, and screen out feature corner pairs in a region where the peak value of the histogram is smaller than a preset threshold;

further, the target feature point pair combination extraction module is specifically configured to: equally dividing the target area image into a 'well' shape from two horizontal and longitudinal directions by using two lines, taking image areas at four corners of the 'well' shape as 4 corner block areas, and taking the rest image areas as interval areas;

Another embodiment of the present invention provides an inter-frame scene matching device based on an aerial video of a multi-rotor unmanned aerial vehicle, including a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, where the processor, when executing the computer program, implements the inter-frame scene matching method based on the aerial video of the multi-rotor unmanned aerial vehicle according to the above embodiment of the present invention.

The embodiment of the invention has the following beneficial effects:

the embodiment of the invention provides an interframe scene matching method and device based on multi-rotor unmanned aerial vehicle aerial video, wherein the method comprises the following steps: intercepting target area images from two video frames; extracting characteristic angle point pairs from the target area image by adopting a sparse optical flow method; screening out a target characteristic angle point pair combination according to the first geometric space distance of each characteristic angle point pair, the well-shaped region division of the target region image and the characteristic angle point pair combination; calculating a second geometric space distance between two corner points of the transformed target characteristic corner pairs according to the perspective transformation matrix, and further calculating the sum of weighted distances of all target characteristic corner pairs in each target characteristic corner pair combination; and taking the combined result of the target feature corner pair with the minimum weighted distance sum as a final matching result. The invention can effectively deal with the complex and changeable motion shooting state of the multi-rotor unmanned aerial vehicle imaging device without depending on flight data or attitude parameters of the multi-rotor unmanned aerial vehicle and the imaging device thereof.

Drawings

Fig. 1 is a schematic flowchart of an inter-frame scene matching method based on an aerial video of a multi-rotor unmanned aerial vehicle according to an embodiment of the present invention;

fig. 2 is a schematic connection diagram of an inter-frame scene matching device based on an aerial video of a multi-rotor unmanned aerial vehicle according to another embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to make the technical solutions of the present invention better understood by those skilled in the art, the present invention will be further described in detail with reference to the accompanying drawings and specific embodiments.

Referring to fig. 1, a schematic flow chart of an inter-frame scene matching method based on an aerial video of a multi-rotor unmanned aerial vehicle according to an embodiment of the present invention includes:

s101: two adjacent video frames are extracted from a video to be processed, fisheye distortion correction is carried out on the two video frames, and then respective target area images are intercepted.

Specifically, after fisheye distortion correction is performed on two video frames, respective target area images are intercepted, specifically:

and transversely trisecting the corrected image, and intercepting the image of the middle area to obtain an image of the target area, so that possible sky and edge images are removed.

Preferably, the interval between two adjacent video frames is not more than 30 frames.

And S102, utilizing a Harris corner detector to detect characteristic corners of the target area image of the previous video frame and extracting the salient characteristic corners.

Preferably, no more than 500 salient feature corners are extracted.

S103: and detecting a new characteristic corner corresponding to the salient characteristic corner in a target area image of a subsequent video frame by adopting a sparse optical flow method according to the salient characteristic corner, and extracting the salient characteristic corner meeting the interframe scene matching condition and the corresponding new characteristic corner as a characteristic corner pair.

Preferably, a sparse optical flow method of a 9-layer pyramid mode with 21 × 21 as the size of a sliding window is used, new positions of 500 corner points detected on the previous frame are searched on the next frame, and pairs of points with the best quality not more than 100 points are taken as possible mapping relations between the previous frame and the next frame.

S104: calculating a first geometric space distance of each characteristic angle point pair, drawing a histogram according to the geometric space distance, then performing region division on the histogram according to a peak value of the histogram, and screening out the characteristic angle point pairs in a region where the peak value of the histogram is smaller than a preset threshold value.

Specifically, the geometric spatial distance data are normalized to be between 0 and 1 and are equidistantly divided into 64 intervals, and the distribution condition of all characteristic point pairs and the spatial distance data is counted through a histogram;

all points in the interval with the number of points less than 6 in 64 intervals are removed, so that the discrete kinematic relation point pairs which are too sparse are screened out.

S105: and carrying out 'well' word region division on the target region image so as to combine all the remaining characteristic angle point pairs according to the region division result, and further screening out target characteristic angle point pair combinations meeting the interframe scene matching conditions.

Specifically, the target area image is equally divided into a shape like a Chinese character 'jing' by using two lines in the transverse direction and the longitudinal direction, image areas at four corners of the shape like the Chinese character 'jing' are used as 4 corner block areas, and the rest image areas are used as interval areas;

dividing the rest characteristic corner pairs into corresponding image regions according to the initial positions of the characteristic corner pairs in the target region image, setting the corner block regions as corner-free corner pair corner block regions if no characteristic corner pair falls into the corner block regions, and dividing all characteristic corner pairs in the interval image regions into corner-free diagonal block regions if 4 corner block regions have and only one corner-free corner pair corner block region;

extracting a characteristic angle point pair from each of the 4 angle block image areas to obtain a target characteristic angle point pair combination consisting of the 4 characteristic angle point pairs, and repeatedly extracting a plurality of target characteristic angle point pair combinations;

screening target characteristic angle point pair combinations meeting the interframe scene matching condition according to the geometric parameters of 4 characteristic angle point pairs in each target characteristic angle point pair combination, wherein the geometric parameters of the 4 characteristic angle point pairs comprise: the area of a quadrangle formed by enclosing 4 characteristic angle points, the area of a 3-angle formed by enclosing any 3 characteristic angle points by 4 characteristic angle points and the geometric distance of any 2 point pairs by 4 characteristic angle points.

Specifically, the method comprises the following steps: removing 4 characteristic angle point pairs in the target characteristic angle point pair combination, wherein the area enclosed on the previous video frame is smaller than 1/4 target characteristic angle point pair combination of the area of the target area image;

removing any 3 characteristic angle point pairs in the target characteristic angle point pair combination, wherein the area of a triangle enclosed on the previous video frame is smaller than 1/8 of the target area image area, or the geometric distance between any 2 characteristic angle point pairs in the triangle on the previous video frame is smaller than 1/2 of the smaller side of the long side or the wide side of the previous video frame.

Preferably, the remaining combinations of the target feature point pairs are selected to be not less than 10 groups.

S106: and calculating a corresponding perspective transformation matrix according to the basic parameters of the target characteristic angle pair combination, and calculating second geometric space distances between two angular points of all target characteristic angle pairs in each converted target characteristic angle pair combination according to the perspective transformation matrix.

Wherein, according to the basic parameters of the target characteristic angle point pair combination, calculating a corresponding perspective transformation matrix, and according to the perspective transformation matrix, specifically:

and applying the perspective transformation matrix to all target characteristic angle point pairs, and calculating to obtain a second geometric space distance between two points in all target characteristic angle point pairs.

It should be noted that: when the basic parameters corresponding to the combination of the screened target feature corner pairs are used for calculating the perspective transformation matrix, the coincidence degree of two transformed images is calculated firstly, whether the matching is too large is evaluated quickly by investigating whether four vertexes of one video frame still remain in the other video frame region, and the two transformed images are determined to still have an overlapping region so as to screen out the perspective transformation parameters with too large deviation.

S107: after the quotient of the second geometric space distance and the first geometric space distance is obtained, calculating the sum of weighted distances of all target characteristic angle point pairs in each target characteristic angle point pair combination;

s108: and taking the target characteristic corner point pair corresponding to the target characteristic corner point pair combination with the minimum sum of the weighted distances and the corresponding perspective transformation matrix as a final matching result.

According to the interframe scene matching method based on the multi-rotor unmanned aerial vehicle aerial video, the target area image is intercepted from two video frames; extracting characteristic angle point pairs from the target area image by adopting a sparse optical flow method; screening out a target characteristic angle point pair combination according to the first geometric space distance of each characteristic angle point pair, the well-shaped region division of the target region image and the characteristic angle point pair combination; calculating a second geometric space distance between two corner points of the transformed target characteristic corner pairs according to the perspective transformation matrix, and further calculating the sum of weighted distances of all target characteristic corner pairs in each target characteristic corner pair combination; and taking the combined result of the target feature corner pair with the minimum weighted distance sum as a final matching result. The embodiment can effectively deal with the complicated and changeable motion shooting state of the multi-rotor unmanned aerial vehicle imaging device, and does not need to depend on the flight data or attitude parameters of the multi-rotor unmanned aerial vehicle and the imaging device thereof.

Referring to fig. 2, an embodiment of the present invention provides an inter-frame scene matching device based on an aerial video of a multi-rotor unmanned aerial vehicle, including:

a target area image capturing module 201, configured to extract two adjacent video frames from a video to be processed, and capture respective target area images after fisheye distortion correction is performed on the two video frames;

transversely trisecting the corrected image, and intercepting the image of the middle area to obtain an image of a target area so as to remove possible sky and edge images;

And a salient feature corner extraction module 202, configured to perform feature corner detection on the target area image of the previous video frame by using a Harris corner detector, and extract a salient feature corner.

Preferably, no more than 500 salient feature corners are extracted.

And the feature corner pair extraction module 203 is configured to detect a new feature corner corresponding to the salient feature corner in the target area image of the subsequent video frame by using a sparse optical flow method according to the salient feature corner, and extract the salient feature corner meeting the inter-frame scene matching condition and the corresponding new feature corner as a feature corner pair.

The feature corner pair screening module 204 calculates a first geometric space distance of each feature corner pair, draws a histogram according to the geometric space distance, then performs region division on the histogram according to a peak value of the histogram, and screens out feature corner pairs in a region where the peak value of the histogram is smaller than a preset threshold value.

A target feature corner pair combination extraction module 205, configured to perform "well" character region division on the target region image, to combine all remaining feature corner pairs according to the region division result, and further screen out a target feature corner pair combination that meets the interframe scene matching condition;

and dividing the rest characteristic corner pairs into corresponding image regions according to the initial positions of the characteristic corner pairs in the target region image, setting the corner block regions as corner-free corner pair corner block regions if no characteristic corner pair falls into the corner block regions, and dividing all characteristic corner pairs in the interval image regions into corner-free diagonal block regions if 4 corner block regions have and only one corner-free corner pair corner block region.

screening target characteristic angle point pair combinations meeting the interframe scene matching conditions according to various geometric parameters of 4 characteristic angle point pairs in each target characteristic angle point pair combination, wherein the geometric parameters of the 4 characteristic angle point pairs comprise: the area of a quadrangle formed by enclosing 4 characteristic angle points, the area of a 3-angle formed by enclosing any 3 characteristic angle points by 4 characteristic angle points and the geometric distance of any 2 point pairs by 4 characteristic angle points.

And the second geometric space distance operation module is used for calculating a corresponding perspective transformation matrix according to the basic parameters of the target characteristic angle pair combination and calculating second geometric space distances between two angular points of all the target characteristic angle pairs in each converted target characteristic angle pair combination according to the perspective transformation matrix.

And the matching module 206 is configured to calculate a sum of weighted distances of all target feature angle pairs in each target feature angle pair combination after performing quotient calculation on the second geometric space distance and the first geometric space distance, and use the target feature angle pair corresponding to the target feature angle pair combination with the smallest sum of the weighted distances and the corresponding perspective transformation matrix as a final matching result.

The inter-frame scene matching device based on the multi-rotor unmanned aerial vehicle aerial video provided by the embodiment intercepts images of a target area from two video frames; extracting characteristic angle point pairs from the target area image by adopting a sparse optical flow method; screening out a target characteristic angle point pair combination according to the first geometric space distance of each characteristic angle point pair, the well-shaped region division of the target region image and the characteristic angle point pair combination; calculating a second geometric space distance between two corner points of the transformed target characteristic corner pairs according to the perspective transformation matrix, and further calculating the sum of weighted distances of all target characteristic corner pairs in each target characteristic corner pair combination; and taking the combined result of the target feature corner pair with the minimum weighted distance sum as a final matching result. The embodiment can effectively deal with the complicated and changeable motion shooting state of the multi-rotor unmanned aerial vehicle imaging device, and does not need to depend on the flight data or attitude parameters of the multi-rotor unmanned aerial vehicle and the imaging device thereof.

Another embodiment of the present invention provides an inter-frame scene matching apparatus based on an aerial video of a multi-rotor drone, including a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, where the processor executes the computer program to implement the inter-frame scene matching method based on the aerial video of the multi-rotor drone according to the above embodiment of the present invention.

By implementing the embodiment of the invention, the two video frame images are respectively processed, the significant characteristic angle point pairs are extracted, then the characteristic angle point pairs are screened and combined so as to screen out 4 point pairs with the highest matching precision, and finally the 4 point pairs are used for calculating the perspective transformation parameters of the interframe scene matching of the two video image frames, so that the interframe scene matching can be realized without depending on flight data or attitude parameters of the multi-rotor unmanned aerial vehicle and an imaging device thereof.

It should be noted that the above-described device embodiments are merely illustrative, wherein the modules described as separate parts may or may not be physically separate, and the parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiment of the apparatus provided by the present invention, the connection relationship between the modules indicates that there is a communication connection between them, and may be specifically implemented as one or more communication buses or signal lines. One of ordinary skill in the art can understand and implement it without inventive effort.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims

1. An interframe scene matching method based on multi-rotor unmanned aerial vehicle aerial video is suitable for being executed in computing equipment, and is characterized by at least comprising the following steps:

according to the salient feature angular points, detecting new feature angular points corresponding to the salient feature angular points in a target area image of a subsequent video frame by adopting a sparse optical flow method, and extracting the salient feature angular points meeting the interframe scene matching conditions and the corresponding new feature angular points as feature angular point pairs;

equally dividing the target area image into a 'well' shape from two horizontal and longitudinal directions by using two lines, taking image areas at four corners of the 'well' shape as 4 corner block areas, and taking the rest image areas as interval areas; extracting one characteristic angle point pair from each of the 4 angle block regions to obtain a target characteristic angle point pair combination consisting of 4 characteristic angle point pairs, and repeatedly extracting a plurality of target characteristic angle point pair combinations; screening target characteristic angle point pair combinations meeting the interframe scene matching conditions according to the geometric parameters of 4 characteristic angle point pairs in each target characteristic angle point pair combination; wherein the geometric parameters of the 4 characteristic angle point pairs comprise: the area of a quadrangle formed by enclosing 4 characteristic angle pairs, the area of a 3-angle formed by enclosing any 3 characteristic angle pairs by 4 characteristic angle pairs and the geometric distance of any 2 point pairs by 4 characteristic angle pairs;

calculating a corresponding perspective transformation matrix according to the basic parameters of the target characteristic angle pair combination, and calculating second geometric space distances between two angle points of all target characteristic angle pairs in each converted target characteristic angle pair combination according to the perspective transformation matrix;

after the second geometric space distance and the first geometric space distance are subjected to quotient calculation, calculating the sum of weighted distances of all target characteristic angle point pairs in each target characteristic angle point pair combination;

2. The method of claim 1, wherein the two adjacent video frames are separated by no more than 30 frames.

3. The inter-frame scene matching method for the multi-rotor unmanned aerial vehicle aerial video according to claim 1, wherein after fisheye distortion correction is performed on the two video frames, respective target area images are captured, specifically:

and transversely trisecting the corrected image, and intercepting the image of the middle area to obtain the image of the target area.

4. The utility model provides an interframe scene matching device based on many rotor unmanned aerial vehicle aerial video, its characterized in that includes:

the target area image intercepting module is used for extracting two adjacent video frames from a video to be processed, and intercepting respective target area images after fisheye distortion correction is carried out on the two video frames;

a feature corner pair extraction module, configured to detect, according to the salient feature corners, new feature corners corresponding to the salient feature corners in a target area image of a subsequent video frame by using a sparse optical flow method, and extract the salient feature corners and the corresponding new feature corners that satisfy an inter-frame scene matching condition as feature corner pairs;

the target feature corner pair combination extraction module is used for equally dividing the target region image into a 'well' shape from the transverse direction and the longitudinal direction by using two lines, taking image regions of four corners of the 'well' shape as 4 corner block regions, and taking the rest image regions as interval regions; extracting one characteristic angle point pair from each of the 4 angle block regions to obtain a target characteristic angle point pair combination consisting of 4 characteristic angle point pairs, and repeatedly extracting a plurality of target characteristic angle point pair combinations; screening target characteristic angle point pair combinations meeting the interframe scene matching conditions according to the geometric parameters of 4 characteristic angle point pairs in each target characteristic angle point pair combination; wherein the geometric parameters of the 4 characteristic angle point pairs comprise: the area of a quadrangle formed by enclosing 4 characteristic angle pairs, the area of a 3-angle formed by enclosing any 3 characteristic angle pairs by 4 characteristic angle pairs and the geometric distance of any 2 point pairs by 4 characteristic angle pairs;

the second geometric space distance operation module is used for calculating a corresponding perspective transformation matrix according to the basic parameters of the target characteristic angle pair combination and calculating second geometric space distances between two angle points of all target characteristic angle pairs in each converted target characteristic angle pair combination according to the perspective transformation matrix;

and the matching module is used for calculating the sum of the weighted distances of all target characteristic angle pairs in each target characteristic angle pair combination after the quotient of the second geometric space distance and the first geometric space distance is obtained, and taking the target characteristic angle pair corresponding to the target characteristic angle pair with the smallest sum of the weighted distances and the corresponding perspective transformation matrix as a final matching result.

5. The inter-frame scene matching device based on multi-rotor unmanned aerial vehicle aerial video of claim 4, wherein the target area image capturing module is specifically configured to: carrying out fisheye distortion correction on the video frame through a parameter matrix of an imaging device to obtain a corrected image;

6. An inter-frame scene matching device based on multi-rotor unmanned aerial vehicle aerial video, comprising a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, wherein the processor when executing the computer program realizes the inter-frame scene matching method based on multi-rotor unmanned aerial vehicle aerial video according to any one of claims 1 to 3.