CN115439424A

CN115439424A - Intelligent detection method for aerial video image of unmanned aerial vehicle

Info

Publication number: CN115439424A
Application number: CN202211010524.XA
Authority: CN
Inventors: 徐龙; 朱绪胜; 欧雷; 陈俊佑; 崔党熊; 秦子燕
Original assignee: Chengdu Aircraft Industrial Group Co Ltd
Current assignee: Chengdu Aircraft Industrial Group Co Ltd
Priority date: 2022-08-23
Filing date: 2022-08-23
Publication date: 2022-12-06
Anticipated expiration: 2042-08-23
Also published as: CN115439424B

Abstract

The invention relates to the field of video monitoring, discloses an intelligent detection method for aerial video images of an unmanned aerial vehicle, and aims to provide a detection method with high detection speed and high detection precision. The invention is realized by the following technical scheme: creating a pre-training model to detect a plurality of aerial video image objects for video data preprocessing, extracting feature points of a target to be detected of a target reference video, splicing key frames of the reference video frame by utilizing image A-KAZE feature point matching, image affine transformation and image fusion, converting the reference video into a reference panoramic image, and splicing to generate a panoramic reference image according to GIS data associated with each image; extracting any detection frame from the detection video; and automatically detecting the change information of the alignment area of the panoramic reference image and the frame to be detected, generating the change information of the image comparison RGB-LBP characteristic, the area of the area and the gray level histogram based on the low-pass filtering initial change, and generating the aerial video image change image with higher final accuracy.

Description

Intelligent detection method for aerial video image of unmanned aerial vehicle

Technical Field

The invention relates to the field of video monitoring such as safety precaution, intelligent traffic, search and rescue and the like, in particular to an intelligent detection method for aerial video images of a low-altitude small unmanned aerial vehicle.

Background

The unmanned aerial vehicle has low flying height and light weight, is provided with a small-image-size digital camera and a high-resolution camera, can capture high-quality images and is used for various analyses. Unmanned Aerial Vehicles (UAVs) are therefore finding increasingly wider application in many areas. Through the unmanned aerial vehicle video of taking photo by plane, can conveniently acquire more static and dynamic information, master the site conditions. Low little unmanned aerial vehicle aerial photography, agriculture, amusement are used for fields such as agriculture, building, public safety and safety, also are adopted rapidly by other fields simultaneously. With the popularization of unmanned aerial vehicles, the change of the surface morphology is frequent, the urgent needs of real-time surveying and mapping and social industries on high-resolution remote sensing images are met, and low-altitude unmanned aerial vehicle remote sensing is rapidly developed. The low-altitude unmanned aerial vehicle remote sensing has the characteristics of low operation cost, flexibility, capability of performing low-altitude flight under the cloud, high resolution ratio of obtained remote sensing images and the like, is used as powerful supplement for satellite remote sensing and manned aircraft aerial remote sensing, and has good effects on obtaining small-area high-resolution remote sensing images, providing disaster emergency surveying and mapping guarantee and providing remote sensing images in difficult terrain areas. The low-altitude unmanned aerial photogrammetry is a novel aerial remote sensing technology developed after satellite remote sensing and large airplane remote sensing. In unmanned aerial vehicle takes photo by plane, image quality is very important, and the factor that influences image quality also has a lot simultaneously, in order to obtain better image, we need master some methods of taking photo by plane. With the maturity of the related technologies of unmanned aerial vehicles, low altitude aerial photography based on unmanned aerial vehicles has gradually become an important supplementary means for acquiring image information by traditional aerial photography and satellite remote sensing due to low cost, flexible take-off and landing and small influence of meteorological conditions (operation under cloud layers), and is increasingly widely applied in the fields of national major natural disaster emergency, geographical national condition monitoring, land management, urban construction planning and the like. Particularly, in the application of target tracking, target searching, ground feature state monitoring and the like in a specific area, frequent, continuous and accurate low-altitude monitoring is generally required to be carried out on the area range, and the small unmanned aerial vehicle video aerial photography can effectively complete the task of monitoring video acquisition due to the advantages of low shooting cost, low-altitude shooting capability, high imaging resolution ratio on a small target and the like. The low-altitude unmanned aerial vehicle photogrammetry system generally comprises a ground system, a flight control system, a data place and an aerial photography system, wherein the ground control system is responsible for planning a target route when the unmanned aerial vehicle is started, a flight path, a map, flight parameters and the like of the flight are displayed by control software of a ground station when the unmanned aerial vehicle flies, and a day standard route and route points are adjusted by using the control system; the flight control system analyzes the speed, the position, the height and the like of the unmanned aerial vehicle by using the GPS positioning signal, so that the unmanned aerial vehicle flies according to a preset route; the data processing system uses image processing software to splice the aerial photos, defines the coordinate information and makes a digital ground model; the aerial photography system can be combined with actual requirements, carry numerical value aerial photography instruments, single-lens reflex digital cameras and other aerial photography systems, and further meet various precision and type aerial photography requirements. However, for the intelligent analysis of the low-altitude small unmanned aerial vehicle video acquisition, especially, the change detection means is used for extracting the ground feature or phenomenon change information of the monitoring area, and an effective method is not available at present. In recent years, the requirements for acquiring remote sensing image data in various fields are higher and higher, the requirements are often difficult to meet due to the influences of height, resolution, weather conditions and revisit periods on satellite remote sensing image collection, and the requirements for emergency tasks are difficult to meet and the cost is high due to the fact that conventional aerial shooting is also limited by conditions such as airspace and weather. There are several challenges to overcome when automatically analyzing drone images. Head-up and small-look on an object: current computer vision algorithms and data sets are designed and evaluated for laboratory settings of close-up object photographs taken by humans centrally through the horizon. For a vertically shot drone image, the object of interest is relatively small and features are few, appearing mainly as a plane and a rectangle. For example, a building image taken from a drone will only show the roof, while a floor image of the building will have features such as doors, windows, and walls. Even if we can obtain a large number of images, we still need to label them. This is a manual annotation interpretation work which is heavy and a manual task which requires accuracy and precision, since "input garbage means output garbage". There is no miracle way to solve the tagging problem, other than manual completion. In any supervised machine learning procedure, labeling images can be the most difficult and time consuming step. The unmanned aerial vehicle image size is large, and the resolution exceeds 3000px X3000px in most cases. This increases the computational complexity in processing such images. Object overlap: one of the problems with segmenting images is that the same object may appear in two different images. This can lead to duplicate detection and counting errors. The extraction of edge features is highly susceptible to noise. Because the unmanned aerial vehicle receives the influence of air current and wind direction when flying in the air, attitude angle and course can produce the deviation, and photograph whirl angle and overlap degree are not stable enough to general unmanned aerial vehicle all matches the non-measurement type camera that the sexual valence is higher so there is nonlinear optical distortion (like barrel-shaped or pillow distortion) in the photograph edge of acquireing, brings the difficulty for image post processing. The detection method greatly depends on the reasonability of artificial design features, so that the robustness is poor. In addition, some objects that are very close to each other may also have overlapping borders during the detection process. Or under the shielding condition, the traditional image change detection mainly solves the problems of large video data volume, small single-frame visual field of video, high overlapping degree of coverage areas of adjacent frames, small-scale change information, complex background and the like in the video change detection. Images are rich in a lot of information, and processing images also means a lot of computation. The objects that are the smallest at the slightest and most difficult to detect are the smallest objects because of their lower resolution. Although video is composed of a series of spatially and temporally continuous video frames, which are essentially a still image, the detection of changes in the still image has been studied extensively, but a real-time and accurate detection method of changes in the video image is rare. The core problem of image change detection is to find the ground feature or phenomenon change information by sending out the image difference between two images (firstly taken as a reference image and then taken as a detection current image) taken at different moments in the same attention area, and the method is an effective means for realizing intelligent analysis of static images and dynamic videos. When the unmanned aerial vehicle executes a reconnaissance task in a certain area, the ground background in a video image is often complex, various differences may exist in the appearance of a target area at different times, and the target area is mostly in an irregular distribution state in the image, so that great difficulty is brought to detection of a change area. The video image change area detection and classification is a technical process for determining the state change of ground features by utilizing unmanned aerial vehicle video scene images covering the same area in multiple periods, and relates to the type, distribution condition and change information of the change, the types and boundaries of the ground features before and after the change, the attribute of the change is analyzed, video images of a certain time before and after the same aerial reconnaissance task are executed, and video images of a certain time twice aerial reconnaissance are executed in different aerial reconnaissance tasks. The most typical application field of video image change area detection is satellite remote sensing image analysis, which realizes the identification and analysis of the state change of targets or phenomena in different time periods by processing remote sensing images covering the same earth surface area acquired in multiple time phases and other auxiliary data, and can also determine the change of ground objects or phenomena in a time interval and provide the quantitative and qualitative analysis of the space distribution and the change of the ground objects. The target area video color, brightness and pixel distribution caused by different shooting time, different shooting angles and different meteorological conditions during shooting of the unmanned aerial vehicle are greatly different, and the image characteristics of multi-source images are greatly different, so that the single characteristic is difficult to complete the registration of the visible light image and the infrared image under the same coordinate system. The transformation relation of the whole image field cannot be accurately fitted by adopting a conventional affine or homography model, and more local registration errors exist in a registration result, so that the extracted change information contains excessive interference, and the variability of the same kind of target cannot be accurately distinguished. According to three levels of image data processing, common change detection in the field of remote sensing image analysis can be divided into pixel level change detection, feature level change detection and target level change detection. The pixel level change detection is to compare the gray or color (RGB) pixel values of different time phases at each position based on image registration/alignment, and determine whether a change occurs, thereby detecting a change region. The pixel level change detection is easily affected by factors such as image registration and radiation correction, but the method greatly retains the original detail information of the image, so the method is the current mainstream change detection method. Feature level change detection requires first determining an object of interest and extracting features (such as edges, shapes, contours, textures, etc.) thereof, and then comprehensively comparing and analyzing the features to obtain change information of the object. In general, feature level change detection performs correlation processing on features, so that the feature level change detection has higher reliability and accuracy in judging feature attributes, but the feature level change detection is not based on original data, information loss inevitably occurs in a feature extraction process, and fine change information is difficult to provide. The target level change detection is to detect the change information of some specific objects (such as roads, houses and the like) on the basis of image understanding and recognition, and is a high-level analysis method based on a target model. The application of the deep neural network in the feature level change detection method enables the algorithm to have higher requirements on the floating point number computing capability of a computer platform, and the computing capability of an airborne computing platform limits the detection effect to a great extent.

Disclosure of Invention

The invention aims to solve the problems of performance and speed of the unmanned aerial vehicle low-altitude aerial video image change detection method in the prior art in unmanned aerial vehicle video monitoring, and provides an intelligent unmanned aerial vehicle aerial video image detection method which is high in detection speed, high in detection precision, real-time and accurate.

The invention is realized by the following technical scheme: an unmanned aerial vehicle aerial video image intelligent detection method comprises the following steps:

the visual receiving system defines coordinate information in the video data acquisition process to obtain a low-altitude aerial photograph video image of the target area of the unmanned aerial vehicle;

the ground station transmits an online detection model based on sample parameters, and transmits the received multi-time-phase aerial photography reference video to the flight control computer through wireless data transmission;

the flight control computer image processor performs fuzzy linear identification data analysis on a visible light scout image and an infrared scout image of a detection video sample characteristic set, creates a pre-training model to detect a plurality of aerial video image objects to perform unmanned aerial vehicle video data preprocessing, calculates pixels in all connected domains in the aerial image video images after preprocessing, takes optical flow as an instantaneous motion field of gray pixel points on the images, and calibrates video key frames and key frame Global Positioning System (GPS) interpolation;

extracting overlapped image information between feature points of a target to be detected in a target reference video and adjacent key frames by a pre-training model, selecting optical flow points at equal intervals in a target area type image, calculating motion vectors of the optical flow points, classifying, comparing and matching the same scene image from different angles, measuring angles between objects in each image, utilizing image A-KAZE feature point matching, image affine transformation and image fusion to splice reference video key frames frame by frame, splicing the frames together in sequence, splicing the frames into a panoramic image, after completing the image splicing and fusion, simultaneously calculating aerial position and yaw by using the corresponding relation of a single-channel image coordinate system and a world coordinate system, splicing to generate a panoramic reference image according to GIS data associated with each image, creating a whole landscape view, extracting key frames of a component histogram in the reference video in the process of splicing the reference video images to generate the panoramic reference image, extracting A-KAZE feature points in the overlapped area of the adjacent key frames, calculating an image transformation matrix by feature point matching, converting the extracted key frames into the panoramic reference frame space, and converting the panoramic reference image into the panoramic reference image space;

in the process of registering a detection video and a panoramic reference image, a detection video frame is registered with the panoramic reference image based on an online detection model transmitted by sample parameters, any detection frame extracted from the detection video is found out, a target area to be detected is manually selected or a frame needing attention in the detection video is found out by an automatic extraction method, the coarse positioning of the detection frame in the reference panoramic image is quickly realized by using the GPS information of the frame to be detected and the full reference scene image, the size of a tracking window is automatically adjusted based on the coarse positioning result, the registration and fusion are carried out on the extracted visible light image, the multi-source image and the homologous image of an infrared image change area under the same coordinate system, the accurate registration of the detection frame and the panoramic reference image is carried out by using the GPS initial positioning and the image registration method based on A-KAZE characteristic point matching, and the quick and accurate alignment of the frame to be detected and the panoramic reference image is realized;

in the process of executing image change detection, an online detection model automatically monitors aerial video images, detects image change regions, performs denoising and histogram equalization operation on registered images, removes the influence of noise, illumination and irrelevant change, effectively removes the influence of parallax and registration error of the generated change images in a low-pass filtering mode, generates initial change images, performs iterative operation by using a MeanShift algorithm, moves the center of a search window to the maximum position of iteration, adjusts the size of the search window, upwards samples through a sliding window, searches for small and dense objects, trains the images by adopting a transfer learning principle based on RGB-LBP characteristic comparison and area and gray histogram comparison of the images, verifies the change positions in the change images by using deep learning software, corrects the initial change images, constructs a change target region data set, trains depth network weights, updates weights and generates aerial video image change images with higher final accuracy.

The method comprises the steps that a vision receiving system is adopted to define under coordinate information in the video data acquisition process, a low-altitude aerial photography image video image of an unmanned aerial vehicle target area is obtained, a ground station transmits an online detection model based on sample parameters, received multi-time-phase aerial photography reference video is transmitted to a flight control computer through wireless data transmission, a flight control computer processor performs fuzzy linear identification data analysis on a visible light scout image and an infrared scout image of a detection video sample characteristic set, a pre-training model is established to detect a plurality of aerial photography video image objects for video data preprocessing, pixels in all connected domains in the aerial photography video image after preprocessing are calculated, an optical flow is used as an instant motion field of gray pixel points on the image, and a video key frame and key frame Global Positioning System (GPS) interpolation is calibrated;

extracting overlapped image information between feature points of a target to be detected of a target reference video and adjacent key frames by a pre-training model, selecting optical flow points at equal intervals in a target area type image, calculating motion vectors of the optical flow points, classifying, comparing and matching the same scene image from different angles, measuring angles between objects in each image, utilizing image A-KAZE feature point matching, image affine transformation and image fusion, splicing the reference video key frames frame by frame, splicing the frames together in sequence, splicing the frames into a panoramic image, after completing the image splicing and fusion, simultaneously calculating aerial photographing positions and yaw by using the corresponding relation between a single-channel image coordinate system and a world coordinate system, splicing to generate the panoramic reference image according to GIS data associated with each image, creating a whole landscape view, extracting key frames of a component histogram in the reference video in the process of splicing the reference video to generate the panoramic reference image, extracting A-KAZE feature points in the overlapped area of the adjacent key frames, performing feature point matching calculation on an image transformation matrix, converting the extracted key frame space into the panoramic reference image, and converting the panoramic reference image into the panoramic reference image;

in the process of registering a detection video and a panoramic reference image, a detection video frame is registered with the panoramic reference image, any detection frame extracted from the detection video is found out, a target area to be detected is manually selected or an automatic extraction method is used for finding out a frame needing attention in the detection video, the coarse positioning of the detection frame in the reference panoramic image is quickly realized by using the GPS information of the frame to be detected and the full reference scene image, the size of a tracking window is automatically adjusted based on the coarse positioning result, the registration and fusion are carried out on the extracted visible light image, the multi-source image and the homologous image of an infrared image change area under the same coordinate system, the accurate registration of the detection frame and the panoramic reference image is carried out by using the GPS initial positioning and the image registration method based on A-KAZE characteristic point matching, and the quick and accurate alignment of the frame to be detected and the panoramic reference image is realized;

in the process of executing image change detection, an online detection model automatically monitors aerial video images, detects image change regions, performs denoising and histogram equalization operation on registered images, removes the influence of noise, illumination and irrelevant change, effectively removes the influence of parallax and registration error of the generated change images in a low-pass filtering mode, generates initial change images, performs iterative operation by using a MeanShift algorithm, moves the center of a search window to the maximum position of iteration, adjusts the size of the search window, upwards samples through a sliding window, searches for small and dense objects, trains the change positions in the change images by using deep learning software based on image morphological processing, RGB-LBP characteristic comparison and area and gray histogram comparison, corrects the initial change images, constructs a change target region data set, trains depth network weights, updates weights and generates the aerial video image change images with higher final accuracy by using a migration learning principle.

In order to better implement the present invention, further, the method for performing video data preprocessing on the unmanned aerial vehicle includes:

shooting sensor calibration, video key frame extraction and GPS interpolation, and generating a panoramic reference image by reference video stitching;

extracting key frames from the reference video, performing image matching based on the adjacent relation of the key frames, screening out a position coordinate set of an aerial video image connected domain, and mapping the position coordinate set to a standard coordinate space to generate a panoramic image required by change detection;

detecting registration or alignment of the aerial video frame with the panoramic reference image;

for video change detection, firstly, finding out frames needing to be processed in a detected video by a manual selection or automatic extraction method;

then, finding the same coverage area in the panoramic reference image and the image to be detected, registering the two images, quickly realizing coarse positioning of a detection frame in the panoramic reference image by utilizing GPS information, finishing the fusion of an infrared image target change area and a visible light image target change area by adopting an adaptive weight target area fusion algorithm, accurately registering the detection frame and the reference image based on an image characteristic point matching method, generating a change image by using a low-pass filtering mode after shooting the detection video frame and the reference video panoramic image are accurately registered, and removing the influence of parallax and registration error;

thirdly, calculating the change information of the position of each aerial video frame by using an RGB-LBP characteristic comparison method;

and finally, verifying the change information of each position by using morphological operation and a comparison means of the area and gray level histogram, and outputting a final change image.

In order to better implement the invention, further, the method for verifying the change position in the change map by using the deep learning software comprises the following steps:

the deep learning software sets corresponding points manually, based on the least square adjustment theory, and performs area network aerial triangulation on the acquired calibration field data and the high-precision control point data by using a light beam method area network adjustment model, and solves required geometric calibration parameters of the shooting sensor, namely, an azimuth element in an image, a radial distortion coefficient, a tangential distortion coefficient, a CCD (charge coupled device) non-square proportional coefficient and a CCD non-orthogonal distortion coefficient.

In order to better implement the present invention, further, the method for calculating the calibration video key frame and key frame global positioning system GPS interpolation comprises:

in video key frame extraction and GPS interpolation, based on unmanned aerial vehicle track data, a formula for deducing the time interval t of automatically extracting key frames under a given overlapping degree is as follows:

wherein the width under the frame widthThe formula for Xn is:

the formula for the width Xf over the frame width is:

to ensure the overlapping degree in the x direction, the formula of the overlapping degree Dx in the x direction of the shooting sensor after t seconds is as follows:

the formula of the overlap Dy in the y direction is:

the formula of the frame amplitude Y is: y = H [ cot (tan) ^-1 (2h/f)+θ)+cot(tan ^-1 (2h/f)-θ)]；

In the video key frame GPS interpolation, aiming at the selected key frame, the position information corresponding to each key frame is recorded, the position information is provided by a GPS navigator carried by the unmanned aerial vehicle, if the GPS information is discontinuous, the Newton interpolation method is used for carrying out the GPS interpolation, so that the GPS information and the extracted key frames are in one-to-one correspondence,

h is at a certain moment t, unmanned aerial vehicle is high, upsilon is speed, omega is wide for shooting sensor, H is high, f is focal length, theta is inclined shooting camera and horizontal plane included angle, and n is a sub-region divided into blocks for corresponding regions.

In order to better implement the present invention, further, the process of generating a panoramic reference map by stitching reference video images includes:

generating a panorama by reference to video stitching, wherein the panorama comprises A-KAZE characteristic point extraction, characteristic point matching, image stitching and panorama generation;

in the A-KAZE feature point extraction, respectively extracting A-KAZE image features from two overlapped adjacent key frames;

firstly, constructing an image pyramid by utilizing nonlinear diffusion filtering and a rapid explicit diffusion FED algorithm;

secondly, searching a determinant extreme value of a Hessian matrix of the 3 multiplied by 3 neighborhood after scale normalization in a nonlinear scale space to obtain an image characteristic point coordinate;

thirdly, determining the main direction of the feature point based on the first order differential values of all adjacent points of the circular region of the feature point;

finally, rotating the feature point neighborhood image to the main direction, and generating an image feature vector by adopting an improved local differential binary descriptor M-LDB;

in the A-KAZE feature point matching, feature points extracted from two overlapped key frames are matched;

firstly, defining the similarity between two A-KAZE feature descriptors by using hamming distance;

then, searching an initial matching point of the feature point by using a bidirectional k nearest neighbor classification KNN algorithm;

finally, screening the matching point pairs by adopting a random sample consensus (RANSAC) algorithm to remove mismatching pairs;

when image splicing and panoramic image generation are carried out, preprocessing is carried out on a received reference video, video key frames are extracted based on random sampling, GPS information of each key frame is recorded, and a reference video key frame set is

K is the total frame number of the key frame extracted from the reference video, K is the frame number of the current key frame, a key frame f1 is set as a panoramic picture space, and the key f is spliced by the video ₂ To the key frame f _k And transforming to the space of the panorama one by one.

In order to better implement the present invention, further, the process of stitching the reference video images to generate the panoramic reference map further includes:

Set of key frames of reference video as

Setting a key frame f1 as a panorama space, and converting the key f ₂ To key frame f _k And transforming the images into a panoramic image space one by one, selecting an affine transformation model M with functions of adapting to translation, rotation and scaling as an image coordinate transformation matrix, and transforming and expressing the image coordinate as follows:

wherein K is the total frame number of key frames extracted from the reference video, K is the frame number of the current key frame, (x, y) and (x ', y') respectively represent the coordinates of pixel points in the panoramic image and the image to be spliced, and m is ₀ -m ₅ Are affine transformation parameters.

In order to better implement the present invention, further, the key frame splicing process includes:

firstly, dividing all pixel nodes into a target change area and a non-change area, extracting a visible light image target change area and an infrared image target change area, and for a key frame f to be spliced ₂ Extracting the key frame f ₁ And a key frame f ₂ Overlapping area A-KAZE feature points, calculating matching point set match of more than 3 pairs of matching points ₂ And match ₁ And obtaining a frame f by using a least square method ₂ To frame f ₁ Image transformation matrix M of panorama space _1,2 (ii) a Then, whether k is more than 2 or not and k is more than 2 to-be-spliced key frame f is judged _k Calculating the feature point set match of the overlapping region of the k-1 frame and the k frame _k-1 And match _k Extracting the key frame f _k And key frame f _k-1 The overlapping area A-KAZE characteristic points are calculated by utilizing a least square method to obtain a frame f _k Transformation matrix M to panorama space _1.K By transforming the matrix M _1.K-1 For match _k Coordinate transformation is carried out to obtain match _k-1 Calculating a transformation matrix M from the k frame to the k-1 frame _1.K To obtain a key frame f _k-1 Matching point set match in panorama space _k-1 A key frame f _k-1 Medium matching point set match _k-1 And match _k-1 Projecting to panorama space, otherwise matching point set match _k-1 And match _k Calculating the feature point set match of the overlapping region of the 1 st frame and the 2 nd frame ₁ And match ₂ Then, the transformation matrix M from the 2 nd frame to the 1 st frame is calculated _1.2 Finally, using the image transformation matrix M _1.K And bilinear interpolation method for converting the key frame f _k And transforming to a panoramic image space, performing splicing processing by using an image fusion technology, completing image splicing, and generating a final panoramic image.

To better implement the present invention, further, the method for performing accurate registration of the detection frame with the panoramic reference map includes:

based on the rapid rough positioning of a GPS and the precise registration based on the A-KAZE characteristics, when the rapid rough positioning is carried out on a detection frame based on the GPS, the received detection video is preprocessed, an image frame for executing change detection and GPS information of the image frame are extracted, the GPS information of the detection frame is compared with the GPS information of each key frame recorded in a panoramic reference image, the closest 4 adjacent key frame areas are found in the panoramic image, and the areas are used as initial reference image areas for change detection; during accurate registration based on the A-KAZE features, extracting the A-KAZE feature points, matching the A-KAZE feature points and transforming the detection image to a reference image space to complete accurate registration of the detection image and a coarse positioning reference image area; then, an image area T and an image area R with the same position and size are respectively extracted from the registered detection image and the panoramic reference image, and a target image with a confidence coefficient threshold value larger than a preset confidence coefficient threshold value is output from a detection result, wherein T and R are respectively used as a test image and a reference image which are input by change detection.

To better implement the present invention, further, the online detection model includes, in performing the image change detection process:

generating an initial change image by converting the reference image R and the detection image T from RGB image to gray image to obtain a corresponding gray reference image R _gray And gray scale test chart T _gray (ii) a Then, based on the gray reference image R _gray And gray scale test chart T _gray And generating a difference image corresponding to the gray value of the position, judging each pixel position of the initial change image D by using the difference value, if D (i, j) represents the change, calculating pixel positions (i, j) of the detection frame and the reference image and the RGB-LBP characteristic of the character position, and representing the size N of a neighborhood window to be NxN. ) Neighborhood windows, calculating the reference image R in low-pass filtering mode _gray To gray detection map T _gray Difference map D of _R Gray level detection chart T _gray To-bet reference chart R _gray Difference map D of _T The calculation formula is as follows:

based on difference image D _R Difference chart D _T And a division threshold δ _diff Calculating an initial variation image D, verifying the pixel position of a median value of the initial variation image D as 1 based on RGB-LBP characteristic comparison to the initial variation image D, confirming by using RGB-LBP characteristic comparison for the position (i, j) of the initial variation image D with the value of 1, setting the position as variation if the variation is detected for two times, or else, not, calculating 8-bit binary coding LBP characteristics of each point in a neighborhood of 15 multiplied by 15 by taking the position (i, j) as the center respectively in 3 color channels of a reference image R and a detection image T, and connecting the final LBP characteristics in series according to the position and the channel to form the LBP characteristics S of the reference image R and the detection image T at the position (i, j) _R (i, j) and S _T (i, j) encoding 0/1 for 8 adjacent positions in a 3 × 3 neighborhood centered on the position, starting from the upper left corner, in a timely manner, if the gray value is lower than the center position, the point is encoded as 0, otherwise 1; then, LBP characteristics S of the reference pattern R and the test pattern T at the (i, j) position are calculated _R And S _T Hamming distance d of _RT (i, j); finally, the pixel position is judged based on the Hamming distanceIf the position (i, j) is changed, if the Hamming distance d _RT (i, j) satisfies d _RT (i,j)＞δ _h ×|S _R And (i, j) |, if the position (i, j) changes, the value of D (i, j) in the initially changed image keeps 1, otherwise, the position (i, j) does not change, and the value of D (i, j) in the initially changed image is modified from 1 to 0. Wherein, the first and the second end of the pipe are connected with each other,

wherein, 0: unchanged, 1: change, N ∈ {7,9,11}, Δ i denotes the offset of position coordinate i in N neighborhood, Δ j denotes the offset of position coordinate j in N neighborhood, δ _diff ∈[0,50]The value is based on the difference degree of image illumination, | S _R I represents the length of the binary feature string, δ _h Is a judgment threshold.

In order to better implement the present invention, further, in the post-processing of the changed image, in order to effectively eliminate false alarms, the post-processing of the verified changed image is required, and the operation flow is as follows: firstly, graying processing, gray correction and smooth filtering are carried out, step edge points and graticule edge points are screened, the edge points are connected, false edges are removed, graticule information is extracted, isolated change positions are removed by using morphological opening operation, the corresponding area in a change graph D after verification is set to be unchanged, then the pixel area of each change area is calculated, and the area delta of the minimum change area is determined according to the image resolution and the size of the minimum concerned target _a The area in the change image D after verification is less than delta _a The area of the image is set to be unchanged, after RGB-LBP characteristics of the positions of a detection frame and a reference image (i, j) are calculated, characteristic similarity is calculated, if the characteristic similarity is larger than a threshold value, D (i, j) is kept unchanged, if the characteristic similarity is smaller than the threshold value, D (i, j) is corrected, the area is set to be unchanged, image post-processing is carried out, and each changed area A in a changed image D after verification is carried out _p Finding out the minimum bounding rectangle area B _p In the gray reference picture R _gray And a gray scale test chart T _gray Extract B from _p Corresponding image area

And

and calculate the region

Characteristic of gray level histogram

And area

Histogram of gray levels feature of

And

the distance of (2) is calculated as follows:

wherein beta is the characteristic dimension of the gray level histogram,

and

representing the mean of the feature, q representing the value of the qth dimension of the histogram; when in use

Less than 0.35, D is in the region A _p The settings are unchanged, otherwise they are unchanged.

Compared with the prior art, the invention has the following advantages and beneficial effects:

(1) The detection speed is high. The method comprises the steps of acquiring a low-altitude aerial photography video image of an unmanned aerial vehicle on the basis of aerial photography video image change detection application, transmitting an online detection model based on sample parameters in the process of acquiring video data by a ground station, detecting a plurality of aerial photography video image objects by establishing a pre-training model to carry out video data preprocessing, calculating pixels in each communication domain in the preprocessed aerial photography video image, using an optical flow as an instantaneous motion field of gray pixel points on the image, and checking a video key frame and a key frame Global Positioning System (GPS) interpolation; the method is particularly suitable for automatically discovering the ground feature or phenomenon change information in the front and rear two sections of videos in two sections of observation videos shot by near downward viewing within a short time interval (dozens of minutes or hours) of a low-altitude unmanned plane: disappearance, appearance, or partial destruction of objects of people, vehicles, buildings, public facilities, and the like. The change detection execution result can be directly returned to the user as image analysis data, and can also be transmitted to higher-level tasks for semantic analysis, such as scene understanding, target detection, target tracking and the like. By combining video key frame extraction and an image splicing technology, a video image change detection method different from the traditional frame-by-frame processing is adopted to convert a reference video into a reference panoramic image, so that the problems of small single-frame visual field, high overlapping degree of coverage areas of adjacent frames of the video and the like are solved, and the number of image frames needing to be processed in video image change detection is greatly reduced on the premise of ensuring that image information is not lost. In the registration process of the detection frame and the panoramic reference image, the GPS coarse positioning and A-KAZE Feature matching technology are combined, the position range of the detection frame in the panoramic reference image is rapidly determined through the former, and the accurate registration of the detection frame and the panoramic reference image is realized through the latter, so that the registration speed and the registration accuracy are obviously improved compared with the traditional image registration method based on Scale-invariant Feature Transform (SIFT) Feature matching;

(2) The detection accuracy is high. Aiming at the characteristics of low-altitude aerial video data of an unmanned aerial vehicle, the images of the same scene are compared and matched from different angles, the angle between objects in each image is measured, the frames are sequentially spliced together to form a panoramic image, after image splicing and fusion are completed, the aerial position and yaw are calculated by using the corresponding relation between a single-channel image coordinate system and a world coordinate system, a panoramic reference image is generated by splicing according to GIS data associated with each image, a whole landscape view is built, a key frame of a component histogram in a reference video is extracted in the process of splicing the reference video images to form the panoramic reference image, an A-KAZE characteristic point is extracted from an overlapping area of adjacent key frames, a characteristic point matching algorithm is used for calculating an image transformation matrix, the extracted key frame is transformed into a panoramic reference image space based on the transformation matrix, the reference video is transformed into the reference image, a hierarchical image based on GPS coarse positioning and A-KAZE characteristic point matching is generated through the panoramic reference image, the accuracy of registration of the registration frame and the panoramic reference image is reduced by 1 pixel, and the accurate registration error rate can be detected. By means of image denoising, image enhancement, low-pass filtering difference image generation, change information detection based on RGB-LBP feature comparison, change information verification based on morphological processing and gray level histogram feature comparison and the like, a large amount of noise, false detection and irrelevant change information (such as water surface ripples, leaf waving and the like) are removed, and the change detection accuracy is improved;

(3) In order to solve the problems of large video data volume, small single-frame visual field of a video, high overlapping degree of coverage areas of adjacent frames and the like, a panoramic reference image generation technology of video key frame extraction and image splicing is combined, a video frame is detected to be registered with a panoramic reference image in the registration process of the detected video and the panoramic reference image based on an online detection model transmitted by sample parameters, a frame needing attention in the detected video is found by manually selecting a to-be-target area or an automatic extraction method, the rough positioning of the detected frame in the reference panoramic image is quickly realized by using the GPS information of the frame to be detected and a full reference scene image, the size of a tracking window is automatically adjusted based on the rough positioning result, the accurate registration of the detected frame and the panoramic reference image is carried out by using an image A-KAZE characteristic point matching method, and the weight is updated; on the premise of not losing video data information, the conversion of detecting the video change to the image change is quickly realized. In order to improve the detection rate of small-scale change information and reduce the influence of a complex background on a detection result, the change information is verified by using two different characteristic comparison methods based on an initial change image generation method of low-pass filtering. In addition, the application of means such as image denoising, image enhancement, morphological processing and the like in the image preprocessing and detection result post-processing process also greatly improves the detection accuracy and the robustness in a complex background;

(4) In the process of executing image change detection, aerial video images are automatically monitored, image change areas are detected, the registered images are subjected to denoising and histogram equalization operation, the influence of noise, illumination and irrelevant change is removed, the influence of parallax and registration error of the generated change images is effectively removed through a low-pass filtering mode, initial change images are generated, iterative operation is carried out through a MeanShift algorithm, the center of a search window is moved to the maximum position of iteration, the size of the search window is adjusted, small and dense objects are searched through upward sampling of a sliding window, the images are trained through a transfer learning principle based on image morphological processing, image Local Binary Pattern RGB-LBP (RGB-Local Binary Pattern) feature comparison and area and gray level histogram comparison, the change positions in the change images are verified through deep learning software, the initial change images are corrected, and the final aerial video image change images with higher accuracy are generated. The method overcomes the defects of the application of the existing change detection technology to aerial video data, is particularly suitable for rapidly and accurately finding the state change conditions of the targets such as people, vehicles, buildings, public facilities and the like in two sections of videos shot by the unmanned aerial vehicle platform within a certain time interval, and has wide application prospects in the fields of scene monitoring, target searching and the like.

Drawings

The invention is further described in connection with the following figures and examples, all of which are intended to be open ended and within the scope of the invention.

FIG. 1 is a flow chart of an intelligent detection method for aerial video images of an unmanned aerial vehicle.

Fig. 2 is a flowchart of generating a panoramic image with reference to the video in fig. 1.

Fig. 3 is a flow chart of the process of detecting video and panoramic image change detection and registration in fig. 1.

Detailed Description

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it should be understood that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments, and therefore should not be considered as a limitation to the scope of protection. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

Example 1:

according to the invention, in the video data acquisition process, a visual receiving system defines coordinate information to obtain a low-altitude aerial photography image video image of a target area of an unmanned aerial vehicle, a ground station transmits an online detection model based on sample parameters, a received multi-time-phase aerial photography reference video is transmitted to a flight control computer through wireless data transmission, a flight control computer processor performs fuzzy linear identification data analysis on a visible light reconnaissance image and an infrared reconnaissance image of a detection video sample feature set, a pre-training model is created to detect a plurality of aerial photography video image objects to perform video data preprocessing, pixels in each communication domain in the aerial photography video image after preprocessing are calculated, optical flow is used as an instantaneous motion field of gray pixel points on the image, and video key frames and key frame global positioning system GPS interpolation are detected;

in the process of executing image change detection, an online detection model automatically monitors aerial video images, detects image change regions, performs denoising and histogram equalization operation on registered images, removes the influence of noise, illumination and irrelevant change, effectively removes the influence of parallax and registration error of the generated change images in a low-pass filtering mode, generates initial change images, performs iterative operation by using a MeanShift algorithm, moves the center of a search window to the maximum position of iteration, adjusts the size of the search window, upwards samples through a sliding window, searches for small and dense objects, trains the change positions in the change images by using deep learning software based on image morphological processing, RGB-LBP characteristic comparison and area and gray histogram comparison, corrects the initial change images, constructs a change target region data set, trains depth network weights, updates weights and generates the aerial video image change images with higher final accuracy by using a migration learning principle. KAZE is a novel multi-scale 2D feature detection and description algorithm performed in a non-linear scale space.

Example 2:

the embodiment is further optimized on the basis of the embodiment 1, and still adopts four steps of video data preprocessing, reference video splicing to generate a panoramic image, detecting the registration of a video frame and the panoramic reference image, and detecting a change area for accurately detecting the change of the low-altitude aerial video image of the unmanned aerial vehicle in real time. In the steps of unmanned aerial vehicle video data preprocessing, reference video splicing to generate a panoramic reference image, video frame detection, panoramic reference image registration and change detection, the method adopts the following specific implementation modes:

the unmanned aerial vehicle video data preprocessing mainly comprises shooting sensor calibration, video key frame extraction and GPS interpolation, and reference videos are spliced to generate a panoramic reference image. The method comprises the following steps that the video image data volume is large, the image content repetition degree between adjacent frames is high, the coverage range of a single-frame image is limited, and the single-frame image is not suitable for being used as a change detection reference image, so that key frames need to be extracted from a reference video, image matching is carried out based on the adjacent relation of the key frames, a position coordinate set of an aerial video image connected domain is screened out and mapped to a standard coordinate space to generate a panoramic image required by change detection;

the method comprises the steps that the frame change detection of the aerial video is carried out on the premise that two images containing the same area are accurately registered, the registration or alignment of the aerial video frame and a panoramic reference image is detected, and for the video change detection, firstly, a frame needing to be processed in a detection video is found through a manual selection or automatic extraction method; then, finding the same coverage area in the panoramic reference image and the image to be detected, registering the two images, rapidly realizing coarse positioning of a detection frame in the panoramic reference image by utilizing GPS information because the coverage area of the panoramic image is large and direct registration can not meet the real-time requirement, finally completing fusion of an infrared image target change area and a visible light image target change area by adopting a self-adaptive weight target area fusion algorithm, and accurately registering the detection frame and the reference image based on an image feature point matching method;

after the aerial photography detection video frame is accurately registered with the reference video panorama, the change detection can automatically find the change conditions of disappearance, appearance, damage and the like of people, vehicles, buildings, public facilities and other targets in two times of shooting in the same coverage area. The change detection method firstly removes the influence of noise, illumination and irrelevant change by means of denoising, histogram equalization and the like; secondly, generating a change map in a low-pass filtering mode, removing the influence of parallax and registration error, calculating the change information of each aerial video frame position by using an RGB-LBP characteristic comparison method, finally verifying the change information of each position by using morphological operation, area and gray level histogram comparison and other means, and outputting a final change image.

Other parts of this embodiment are the same as embodiment 1, and thus are not described again.

Example 3:

the embodiment is further optimized on the basis of the

embodiment

1 or 2, and the unmanned aerial vehicle video data preprocessing comprises shooting sensor calibration, video key frame extraction and GPS interpolation, and the shooting sensor calibration is carried out. The calibration of the shooting sensor is the premise of subsequent work, before the calibration of the shooting sensor, the mechanical structure of the shooting sensor is confirmed to be firm and stable without shaking, and meanwhile, the optical structure and the electronic structure of the shooting sensor are ensured to be reliable and stable as well; then, the geometric calibration of the camera is carried out by utilizing an outdoor calibration field, and the specific process is as follows: the deep learning software sets corresponding points manually, performs area network aerial triangulation on the acquired calibration field data and high-precision control point data by using a beam method area network adjustment model based on a least square adjustment theory, and solves required geometric calibration parameters of the shooting sensor, namely an azimuth element in an image, a radial distortion coefficient, a tangential distortion coefficient, a Charge Coupled Device (CCD) non-square proportional coefficient and a CCD non-orthogonal distortion coefficient.

The rest of this embodiment is the same as

embodiment

1 or 2, and therefore, the description thereof is omitted.

Example 4:

in this embodiment, further optimization is performed on the basis of any one of the above embodiments 1 to 3, and in the video key frame extraction and GPS interpolation, based on the unmanned aerial vehicle track data, a formula for deducing the time interval of automatically extracting key frames under a given overlap degree is as follows:

wherein, the frame width is:

width in frame width:

to ensure the overlapping degree in the x direction, the overlapping degree of the shooting sensor in the x direction after t seconds is as follows:

the degree of overlap in the y-direction is:

frame amplitude height: y = H · [ cot (tan) ^-1 (2h/f)+θ)+cot(tan ^-1 (2h/f)-θ)]If the GPS information is discontinuous, utilizing a Newton interpolation method to carry out GPS interpolation, and enabling the GPS information to be in one-to-one correspondence with the extracted key frames;

in the formula, H assumes the height (unit: m) of the unmanned aerial vehicle at a certain moment, the speed is upsilon (unit: m/s), omega is the width of a sensor of a shooting sensor, H is the height, f is the focal length (unit: mm), theta is the included angle between a camera and the horizontal plane, and n is the division of a corresponding area into block sub-areas.

For the received video, due to the fact that the data size is large, the information repetition rate of adjacent frames of the video is high, and key frame selection is the key of video change detection. Considering the influence of oblique shooting (the included angle between a camera and the horizontal plane is theta, the shot image covers the actual width of the ground to present a narrow lower part and a wide upper part), aiming at the selected key frames, the position information corresponding to each key frame needs to be recorded, and the position information is provided by a GPS navigator carried by the unmanned aerial vehicle. Therefore, in the video key frame GPS interpolation, the position information corresponding to each key frame is recorded aiming at the selected key frame, the position information is provided by a GPS navigator carried by the unmanned aerial vehicle, and if the GPS information is discontinuous, the Newton interpolation method is used for carrying out the GPS interpolation, so that the GPS information and the extracted key frames are in one-to-one correspondence.

Other parts of this embodiment are the same as any of embodiments 1 to 3, and thus are not described again.

Example 5:

in this embodiment, further optimization is performed on the basis of any one of the above embodiments 1 to 4, and as shown in fig. 2, generating a panorama by reference to video stitching includes-KAZE feature point extraction, feature point matching, image stitching, and panorama generation: in the A-KAZE feature point extraction, the A-KAZE image features are respectively extracted from two overlapped adjacent key frames, and the main flow is as follows: firstly, constructing an image pyramid by utilizing a nonlinear Diffusion filtering and a Fast Explicit Diffusion (FED) algorithm; secondly, searching a determinant extreme value of a Hessian matrix of the 3 multiplied by 3 neighborhood after scale normalization in a nonlinear scale space to obtain an image characteristic point coordinate; thirdly, determining the main direction of the feature point based on the first order differential values of all adjacent points of the circular region of the feature point; finally, rotating the feature point neighborhood image to the main direction, and generating an image feature vector by adopting an improved Local-differential Binary descriptor M-LDB (Modified-Local Difference Binary);

in the A-KAZE feature point matching, feature points extracted from two overlapped key frames are matched, and the main flow is as follows: firstly, defining the similarity between two A-KAZE feature descriptors by using a hamming distance; then, searching an initial matching point of the feature point by using a bidirectional k nearest neighbor classification KNN algorithm; finally, screening the matching point pairs by adopting a random sample consensus (RANSAC) algorithm to remove mismatching pairs;

when image splicing and panoramic image generation are carried out, preprocessing is carried out on a received reference video, GPS information of each key frame is recorded, key frame interval time is calculated, and views are extracted based on random samplingFrequency key frame, from reference video key frame set

Setting a key frame f1 as a panorama space, and setting a key f ₂ To the key frame f _k Transforming to a panoramic image space one by one, selecting an affine transformation model M with functions of adapting to translation, rotation and scaling as an image coordinate transformation matrix, and transforming and expressing the image coordinate as follows:

wherein, K is the total frame number of the key frame extracted from the reference video, K is the frame number of the current key frame, (x, y) and (x, y) respectively represent the coordinates of pixel points in the panoramic image and the image to be spliced, and m ₀ -m ₅ Are affine transformation parameters.

Other parts of this embodiment are the same as any of embodiments 1 to 4, and thus are not described again.

Example 6:

in this embodiment, further optimization is performed on the basis of any one of the embodiments 1 to 5, and a specific key frame splicing process is as follows: in the key frame splicing process, firstly, all pixel nodes are divided into a target change area and a non-change area, a visible light image target change area and an infrared image target change area are extracted, and for a key frame f to be spliced, the target change area and the infrared image target change area are extracted ₂ Extracting the key frame f ₁ And a key frame f ₂ Calculating matching point set match of more than 3 pairs of matching points according to the overlapping region A-KAZE characteristic points ₂ And match ₁ And obtaining a frame f by using a least square method ₂ To frame f ₁ Image transformation matrix M of panorama space _1,2 (ii) a Then, whether k is more than 2 or not and k is more than 2 to-be-spliced key frame f is judged _k Calculating the feature point set match of the overlapping region of the k-1 frame and the k frame _k-1 And match _k Extracting the key frame f _k And a key frame f _k-1 The overlapping area A-KAZE characteristic points are calculated by utilizing a least square method to obtain a frame f _k Transformation matrix M to panorama space _1.K By transforming the matrix M _1.K-1 For match _k Coordinate transformation is carried out to obtain match _k-1 Calculating a transformation matrix M from the k frame to the k-1 frame _1.K Obtaining a key frame f _k-1 Matching point set match in panorama space _k-1 The key frame f _k-1 Medium matching point set match _k-1 And match _k-1 Projecting to panorama space, otherwise matching point set match _k-1 And match _k Calculating the feature point set match of the overlapping region of the 1 st frame and the 2 nd frame ₁ And match ₂ Then, the transformation matrix M from the 2 nd frame to the 1 st frame is calculated _1.2 Finally, using the image transformation matrix M _1.K And bilinear interpolation method for converting the key frame f _k And transforming to a panoramic image space, performing splicing treatment by using an image fusion technology, completing image splicing, and generating a final panoramic image.

The method comprises the steps that registration of a detection frame and a full reference scene image comprises rapid rough positioning based on a GPS and accurate registration based on A-KAZE characteristics, when the detection frame based on the GPS is rapidly roughly positioned, preprocessing is carried out on a received detection video, an image frame for executing change detection and GPS information of the image frame are extracted, the GPS information of the detection frame is compared with the GPS information of each key frame recorded in a panoramic reference image, 4 nearest adjacent key frame areas are found in the panoramic image, and the areas are used as initial reference image areas for change detection; during accurate registration based on the A-KAZE features, extracting the A-KAZE feature points, matching the A-KAZE feature points and transforming the detection image to a reference image space to complete accurate registration of the detection image and a coarse positioning reference image area; then, an image area T and an image area R with the same position and size are respectively extracted from the registered detection image and the panoramic reference image, and a target image with a confidence coefficient threshold value larger than a preset confidence coefficient threshold value is output from a detection result, wherein T and R are respectively used as a test image and a reference image which are input in change detection.

Other parts of this embodiment are the same as any of embodiments 1 to 5, and thus are not described again.

Example 7:

this embodiment is further optimized on the basis of any of the above embodiments 1 to 6, as shown in fig. 3, the image change detection execution mainly includes image preprocessing, low-pass filtering to change into a map, in the image preprocessing, inputting a panorama and a detection frame, blurring the input test image T and reference image R by using a 2-dimensional gaussian convolution kernel, coarsely positioning the detection frame by using GPS information, extracting a-KAZE features, calculating a transformation matrix, accurately registering the detection frame and reference image, preprocessing the detection frame and reference image after registration, generating an initial change image D by using a low-pass filtering method, for RGB images, gaussian filtering the three channel images of the reference image R and detection image T respectively, removing details (such as water surface ripples, leaf waving, etc.) and the influence of noise on the change detection result, then increasing the contrast of the reference image R and detection image T by using a histogram equalization method, the details become more detailed, and simultaneously reducing the influence caused by the difference of the illumination of the two images, and then verifying and processing the change based on RGB-LBP features and the detection result;

in order to overcome the influence of illumination, noise, parallax and registration error, the line detection model generates an initial change image in the image change detection execution process, firstly, the reference image R and the detection image T are converted from an RGB image to a gray image to obtain a corresponding gray reference image R _gray And gray scale test chart T _gray (ii) a Then, based on the gray reference image R _gray And gray scale test chart T _gray And generating a difference image according to the gray value of the corresponding position, judging each pixel position of the initial change image D by using the difference value, if D (i, j) represents the change, calculating pixel positions (i, j) of the detection frame and the reference image and the character position RGB-LBP characteristic, and representing the size N of a neighborhood window, wherein the size N is NxN. ) Neighborhood windows, calculating the gray reference image R by low-pass filtering _gray To gray detection map T _gray Difference map D of _R Gray scale detection map T _gray To-game reference picture R _gray Difference map D of (2) _T The calculation formula is as follows:

based on difference image D _R Difference chart D _T And dividing the threshold value delta _diff Calculating an initial variation image D, verifying the pixel position of a median value of the initial variation image D as 1 based on RGB-LBP characteristic comparison to the initial variation image D, confirming by using RGB-LBP characteristic comparison for the position (i, j) of the initial variation image D with the value of 1, setting the position as variation if the variation is detected for two times, or else, not, calculating 8-bit binary coding LBP characteristics of each point in a neighborhood of 15 multiplied by 15 by taking the position (i, j) as the center respectively in 3 color channels of a reference image R and a detection image T, and connecting the final LBP characteristics in series according to the position and the channel to form the LBP characteristics S of the reference image R and the detection image T at the position (i, j) _R (i, j) and S _T (i, j) encoding 0/1 for 8 adjacent positions in a 3 × 3 neighborhood centered on the position, starting from the upper left corner, in a timely manner, if the gray value is lower than the center position, the point is encoded as 0, otherwise 1; then, LBP characteristics S of the reference pattern R and the test pattern T at the (i, j) position are calculated _R And S _T Hamming distance d of _RT (i, j); finally, it is determined whether the pixel position (i, j) has changed based on the Hamming distance d if the pixel position (i, j) has changed _RT (i, j) satisfies d _RT (i,j)＞δ _h ×|S _R And (i, j) |, changing the position of (i, j), and keeping the value of D (i, j) in the initially changed image unchanged at 1, otherwise, changing the value of D (i, j) in the initially changed image from 1 to 0 when the position of (i, j) is unchanged. Wherein the content of the first and second substances,

wherein, 0: unchanged, 1: change, N ∈ {7,9,11}, Δ i denotes the offset of position coordinate i in N neighborhood, Δ j denotes the offset of position coordinate j in N neighborhood, δ _diff ∈[0,50]The value is based on the difference degree of image illumination, | S _R I represents the length of the binary feature string, δ _h To determine the threshold, 0.3 is generally used.

Other parts of this embodiment are the same as any of embodiments 1 to 6, and thus are not described again.

Example 8:

in this embodiment, further optimization is performed on the basis of any one of embodiments 1 to 7, in the post-processing of the changed image, in order to effectively eliminate false alarms, post-processing needs to be performed on the changed image after verification, and the operation flow is as follows: firstly, graying processing, gray correction and smooth filtering are carried out, step edge points and graticule edge points are screened, the edge points are connected, false edges are removed, graticule information is extracted, isolated change positions are removed by using morphological opening operation, the corresponding area in a change graph D after verification is set to be unchanged, then the pixel area of each change area is calculated, and the area delta of the minimum change area is determined according to the image resolution and the size of the minimum concerned target _a The area in the change image D after verification is less than delta _a The area of the detection frame is set to be unchanged, after RGB-LBP characteristics of the positions of the detection frame and the reference image (i, j) are calculated, the characteristic similarity is calculated, if the characteristic similarity is larger than a threshold value, D (i, j) is kept unchanged, if the characteristic similarity is smaller than the threshold value, D (i, j) is corrected, the area is set to be unchanged, image post-processing is carried out, and each changed area A in a changed image D after verification is carried out _p Finding out the minimum bounding rectangular region B _p In the gray reference image R _gray And gray scale test chart T _gray Extract B from _p Corresponding to image area

And

and calculates the region

Histogram of gray levels feature of

And area

Characteristic of gray level histogram

And

the distance of (2) is calculated as follows:

wherein beta is the characteristic dimension of the gray level histogram,

and

represents the mean of the feature and q represents the value of the qth dimension of the histogram. When in use

For convenience of display, the unchanged-position pixel value in the change map D is set to 0 (black), and the changed-position pixel value is set to 255 (white). This is done by the change detection of the video of the unmanned aerial vehicle.

Other parts of this embodiment are the same as any of embodiments 1 to 7, and thus are not described again.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and all simple modifications and equivalent variations of the above embodiments according to the technical spirit of the present invention are included in the scope of the present invention.

Claims

1. An unmanned aerial vehicle aerial video image intelligent detection method is characterized by comprising the following steps:

the flight control computer image processor performs fuzzy linear identification data analysis on a visible light scout image and an infrared scout image of a detection video sample feature set, creates a pre-training model to detect a plurality of aerial photography video image objects to perform unmanned aerial vehicle video data preprocessing, calculates pixels in each communication domain in the aerial photography video image after preprocessing, takes an optical flow as an instantaneous motion field of gray pixel points on the image, and calibrates a video key frame and a key frame Global Positioning System (GPS) interpolation;

in the process of executing image change detection, an online detection model automatically monitors aerial video images, detects image change regions, performs denoising and histogram equalization operation on the registered images, removes the influence of noise, illumination and irrelevant change, effectively removes the influence of parallax and registration error of the generated change images through a low-pass filtering mode, generates initial change images, performs iterative operation by using a MeanShift algorithm, moves the center of a search window to the maximum position of iteration, adjusts the size of the search window, performs upward sampling through a sliding window, searches for small and dense objects, trains the images by adopting a migration learning principle based on image morphological processing, RGB-LBP characteristic comparison of image local binary patterns and area and gray level histogram comparison, verifies the change positions in the change images by using deep learning software, corrects the initial change images, constructs a change target region data set, trains depth network weights, updates weights and generates the aerial video image change images with higher accuracy finally.

2. The intelligent detection method for the aerial video image of the unmanned aerial vehicle according to claim 1, wherein the method for preprocessing the video data of the unmanned aerial vehicle comprises the following steps:

extracting key frames from a reference video, performing image matching based on the adjacent relation of the key frames, screening out a position coordinate set of an aerial video image connected domain, and mapping the position coordinate set to a standard coordinate space to generate a panoramic image required by change detection;

for video change detection, firstly, finding out frames needing to be processed in a detected video through a manual selection or automatic extraction method;

3. The intelligent detection method for the unmanned aerial vehicle aerial video image according to claim 1, wherein the method for verifying the change position in the change map by using the deep learning software comprises the following steps:

4. The intelligent detection method for the unmanned aerial vehicle aerial video image according to claim 1, wherein the calculation method for checking the GPS interpolation of the video key frame and the key frame global positioning system comprises the following steps:

wherein, the width under the frame is X _n The formula of (1) is as follows:

width X over frame width _f The formula of (1) is:

to ensure the overlap in the x-direction, the overlap D of the image sensors in the x-direction is recorded after t seconds _x The formula of (1) is as follows:

degree of overlap D in y-direction _y The formula of (1) is as follows:

the formula of the frame height Y is: y = H [ cot (tan) ^-1 (2h/f)+θ)+cot(tan ^-1 (2h/f)-θ)]；

wherein, H is a certain moment t unmanned aerial vehicle height, and upsilon is speed, and omega is for shooing sensor width, and H is the height, and f is the focus, and theta shoots camera and horizontal plane contained angle for the slope, and n is for corresponding the regional division one-tenth piece subregion.

5. The intelligent detection method for the unmanned aerial vehicle aerial video image according to claim 1, wherein the process of stitching the reference video images to generate the panoramic reference image comprises:

firstly, constructing an image pyramid by utilizing a nonlinear diffusion filtering and a rapid explicit diffusion FED algorithm;

thirdly, determining the principal direction of the feature point based on the first order differential values of all adjacent points of the circular region of the feature point;

then, searching an initial matching point of the feature point by using a bidirectional k nearest neighbor classification (KNN) algorithm;

K is the total frame number of the key frame extracted from the reference video, K is the frame number of the current key frame, a key frame f1 is set as a panoramic picture space, and the key f is spliced by the video ₂ To the key frame f _k And transforming to the panoramic image space one by one.

6. The method of claim 1, wherein the process of stitching the reference video images to generate the panoramic reference image further comprises:

Set of key frames in reference video as

wherein K is the total frame number of the key frame extracted from the reference video, K is the frame number of the current key frame, (x, y) and (x ', y') respectively represent the coordinates of pixel points in the panoramic image and the image to be spliced, and m ₀ -m ₅ Are affine transformation parameters.

7. The intelligent detection method for the aerial video images of the unmanned aerial vehicle according to claim 1, wherein the key frame splicing process comprises:

first, all the pixels are divided into twoDividing the nodes into a target change area and a non-change area, extracting a visible light image target change area and an infrared image target change area, and for a key frame f to be spliced ₂ Extracting the key frame f ₁ And key frame f ₂ Overlapping area A-KAZE feature points, calculating matching point set match of more than 3 pairs of matching points ₂ And match ₁ And obtaining a frame f by using a least square method ₂ To frame f ₁ Image transformation matrix M of panorama space _1,2 (ii) a Then, whether k is more than 2 or not is judged, and the key frame f to be spliced is more than 2 _k Calculating the feature point set match of the overlapping region of the k-1 frame and the k frame _k-1 And match _k Extracting the key frame f _k And key frame f _k-1 The overlapping area A-KAZE characteristic points are calculated by utilizing a least square method to obtain a frame f _k Transformation matrix M to panorama space _1.K By transforming the matrix M _1.K-1 For match _k Coordinate transformation is carried out to obtain match _k-1 Calculating a transformation matrix M from the k frame to the k-1 frame _1.K Obtaining a key frame f _k-1 Matching point set match in panorama space _k-1 The key frame f _k-1 Medium matching point set match _k-1 And match _k-1 Projecting to a panoramic image space, otherwise matching point set match based _k-1 And match _k Calculating the feature point set match of the overlapping region of the 1 st frame and the 2 nd frame ₁ And match ₂ Then, the transformation matrix M from the 2 nd frame to the 1 st frame is calculated _1.2 Finally, using the image transformation matrix M _1.K And bilinear interpolation method for converting the key frame f _k And transforming to a panoramic image space, performing splicing treatment by using an image fusion technology, completing image splicing, and generating a final panoramic image.

8. The intelligent detection method for the aerial video image of the unmanned aerial vehicle according to claim 1, wherein the method for performing accurate registration of the detection frame with the panoramic reference image comprises:

based on the rapid rough positioning of the GPS and the precise registration based on the A-KAZE characteristics, when the rapid rough positioning is carried out on a detection frame based on the GPS, the received detection video is preprocessed, an image frame for executing change detection and GPS information of the image frame are extracted, the GPS information of the detection frame is compared with the GPS information of each key frame recorded in a panoramic reference image, 4 nearest adjacent key frame areas are found in the panoramic image, and the areas are used as initial reference image areas for change detection; during accurate registration based on the A-KAZE features, extracting the A-KAZE feature points, matching the A-KAZE feature points and transforming the detection image to a reference image space to complete accurate registration of the detection image and a coarse positioning reference image area; then, an image area T and an image area R with the same position and size are respectively extracted from the registered detection image and the panoramic reference image, and a target image with a confidence coefficient threshold value larger than a preset confidence coefficient threshold value is output from a detection result, wherein T and R are respectively used as a test image and a reference image which are input by change detection.

9. The intelligent detection method for the aerial video image of the unmanned aerial vehicle according to claim 1, wherein the online detection model comprises, in the process of performing image change detection:

generating an initial change image by converting the reference image R and the detection image T from RGB image to gray image to obtain a corresponding gray reference image R _gray And gray scale test chart T _gray (ii) a Then, based on the gray reference image R _gray And gray scale test chart T _gray And generating a difference image corresponding to the gray value of the position, judging each pixel position of the initial change image D by using the difference value, if D (i, j) represents the change, calculating pixel positions (i, j) of the detection frame and the reference image and the RGB-LBP characteristic of the character position, and representing the size N of a neighborhood window to be NxN. ) Neighborhood windows, calculating the reference image R in low-pass filtering mode _gray To gray detection map T _gray Difference map D of _R Gray scale detection map T _gray To-bet reference chart R _gray Difference map D of _T The calculation formula is as follows:

based on difference image D _R Difference chart D _T And dividing the threshold value delta _diff Calculating an initial variation image D, verifying the pixel position of a median value of the initial variation image D as 1 based on RGB-LBP characteristic comparison to the initial variation image D, confirming by using RGB-LBP characteristic comparison for the position (i, j) of the initial variation image D with the value of 1, setting the position as variation if the variation is detected for two times, or else, not, calculating 8-bit binary coding LBP characteristics of each point in a neighborhood of 15 multiplied by 15 by taking the position (i, j) as the center respectively in 3 color channels of a reference image R and a detection image T, and connecting the final LBP characteristics in series according to the position and the channel to form the LBP characteristics S of the reference image R and the detection image T at the position (i, j) _R (i, j) and S _T (i, j) encoding 0/1 for 8 adjacent positions in a 3 × 3 neighborhood centered on the position, starting from the upper left corner, in a timely manner, if the gray value is lower than the center position, the point is encoded as 0, otherwise 1; then, LBP characteristics S of the reference pattern R and the test pattern T at the (i, j) position are calculated _R And S _T Hamming distance d of _RT (i, j); finally, whether the pixel position (i, j) changes is judged based on the Hamming distance, if the Hamming distance d _RT (i, j) satisfies d _RT (i,j)＞δ _h ×|S _R And (i, j) |, if the position (i, j) changes, the value of D (i, j) in the initially changed image keeps 1, otherwise, the position (i, j) does not change, and the value of D (i, j) in the initially changed image is modified from 1 to 0. Wherein the content of the first and second substances,

wherein, 0: unchanged, 1: variation, N ∈ {7,9,11}, Δ i denotes the offset of position coordinate i in N neighborhood, Δ j denotes the offset of position coordinate j in N neighborhood, δ _diff ∈[0,50]The value is based on the difference degree of image illumination, | S _R I represents the length of the binary feature string, δ _h Is a judgment threshold.

10. The intelligent detection method for the aerial video images of the unmanned aerial vehicle according to any one of claims 1 to 9, comprising:

in the post-processing of the changed image, in order to effectively eliminate false alarm, the post-processing of the changed image after verification is required, and the operation flow is as follows: firstly, graying processing, gray correction and smooth filtering are carried out, step edge points and line edge points are screened and connected, false edges are eliminated, line information is extracted, isolated change positions are removed by morphological open operation, corresponding areas in a change diagram D after verification are set to be unchanged, then pixel area of each change area is calculated, and area delta of a minimum change area is determined according to image resolution and size of a minimum concerned target _a The area in the change image D after verification is less than delta _a The area of the image is set to be unchanged, after RGB-LBP characteristics of the positions of a detection frame and a reference image (i, j) are calculated, characteristic similarity is calculated, if the characteristic similarity is larger than a threshold value, D (i, j) is kept unchanged, if the characteristic similarity is smaller than the threshold value, D (i, j) is corrected, the area is set to be unchanged, image post-processing is carried out, and each changed area A in a changed image D after verification is carried out _p Finding out the minimum bounding rectangular region B _p In the gray reference image R _gray And gray scale test chart T _gray Extract B from _p Corresponding image area