CN117576597B

CN117576597B - Visual identification method and system based on unmanned aerial vehicle driving

Info

Publication number: CN117576597B
Application number: CN202410051695.XA
Authority: CN
Inventors: 崔飞易; 刘清秀; 李建华
Original assignee: Shenzhen Jinfeijie Information Technology Service Co ltd
Current assignee: Shenzhen Jinfeijie Information Technology Service Co ltd
Priority date: 2024-01-15
Filing date: 2024-01-15
Publication date: 2024-04-12
Anticipated expiration: 2044-01-15
Also published as: CN117576597A

Abstract

The invention relates to the technical field of visual identification, and discloses a visual identification method and a visual identification system based on unmanned aerial vehicle driving. The system comprises: collecting sample image data to establish a sample set, performing feature extraction through a neural network, setting an unmanned aerial vehicle identification area, operating the unmanned aerial vehicle to collect image data in the identification area in real time, preprocessing the image data collected by the unmanned aerial vehicle in real time, and performing feature extraction on the image data collected in real time through the neural network; and comparing the characteristic data in the real-time collected image data with the characteristic data in each classified image data in the sample set, completing the identification of the real-time collected image data based on the comparison result, setting an evaluation index, and adjusting the visual identification precision of unmanned aerial vehicle driving based on the evaluation index. The visual recognition system based on unmanned aerial vehicle driving can perform real-time recognition on image data in a real-time acquisition recognition area of the unmanned aerial vehicle.

Description

Visual identification method and system based on unmanned aerial vehicle driving

Technical Field

The invention relates to the technical field of visual identification, in particular to a visual identification method and system based on unmanned aerial vehicle driving.

Background

Along with the gradual maturity of the target detection technology, the method is widely applied to traditional fixed monitoring equipment, but the traditional monitoring equipment has the limitations of poor flexibility and small monitoring range, is difficult to realize the identification of scattered and movable multi-targets, and is generally only used for closed indoor scenes such as supermarkets, shopping malls and the like. The unmanned plane has the advantages of easy operation, good flexibility, strong maneuverability, wide monitoring range and the like, and can make up the defects of the traditional monitoring equipment. However, in a task scene where the unmanned aerial vehicle recognizes a plurality of moving targets, besides the traditional target detection difficulties such as shielding, visual angle change, light condition change and the like, adverse effects on image quality caused by target scale change and unmanned aerial vehicle and target maneuver exist. And because the high flexibility of the unmanned aerial vehicle determines that the carrying capacity of the unmanned aerial vehicle is very limited, the computing resources are relatively deficient, the real-time effect is difficult to achieve when the detection model is deployed on the platform, and the autonomous propulsion on the unmanned aerial vehicle is hindered.

In the prior art CN115880589a, recognition and evaluation are performed on the target acquired by the unmanned aerial vehicle only through a neural network training mode, so that adverse effects of maneuvering of the unmanned aerial vehicle and the target on image quality are ignored, and the method has a great limitation.

Disclosure of Invention

(one) solving the technical problems

Aiming at the defects of the prior art, the invention provides a visual recognition system based on unmanned aerial vehicle driving, which has the advantages of accurate recognition and the like, and solves the problem of adverse effect of unmanned aerial vehicle and target maneuver on image quality.

In order to solve the technical problem of adverse effect of unmanned aerial vehicle and target maneuver on image quality, the invention provides the following technical scheme:

the embodiment discloses a visual identification method based on unmanned aerial vehicle driving, which specifically comprises the following steps:

s1, establishing a sample set based on collected sample image data, and inputting the sample set into a neural network for feature extraction;

s2, setting an unmanned aerial vehicle identification area, and operating the unmanned aerial vehicle to collect image data in the identification area in real time;

s3, preprocessing image data acquired by the unmanned aerial vehicle in real time;

s4, inputting the preprocessed image data into a neural network, and extracting features of the image data acquired in real time based on the neural network;

s5, comparing the characteristic data in the real-time collected image data with the characteristic data in each classified image data in the sample set, and completing the identification of the real-time collected image data based on the comparison result;

s6, setting an evaluation index, and adjusting the visual recognition accuracy of unmanned aerial vehicle driving based on the evaluation index;

preferably, the establishing a sample set based on the collected sample image data, and inputting the sample set into the neural network for feature extraction includes:

s11, classifying the image data in the sample set according to the content;

s12, sequentially inputting the classified image data into a neural network for feature extraction, and extracting feature data in each classified image data;

and S13, storing the feature data of the extracted classified image data.

Preferably, the preprocessing the image data collected by the unmanned aerial vehicle in real time includes:

s31, filtering processing is carried out on image data acquired in real time;

s32, performing contrast enhancement on the filtered image data.

Preferably, the filtering processing of the image data acquired in real time includes:

setting p as a central pixel based on the picture center, and setting a neighborhood set S of the central pixel, wherein q belongs to the field set S of the central pixel;

wherein e is a natural constant;I _p represented as p= (x) in image I ₁ ,y ₁ ) Gray value of dot, (x) ₁ ,y ₁ ) The coordinates of the pixel point are represented,I _q represented as q= (u) in image I ₁ ,v ₁ ) Gray value of the dot; σs represents the standard deviation of the spatial distance in the gaussian function, σr represents the standard deviation of the pixel value in the gaussian function; gs denotes a spatial distance weight, gr denotes a pixel value weight;BF _p representing the gray value, W, of p-point in the filtered image _q The sum of weights representing the pixel values of the q points.

Preferably, the contrast enhancement of the filtered image data includes:

s321, establishing a linear mapping relation of the image data based on brightness in the filtered image data;

wherein L is _in1 (x, y) represents the pixel value before contrast enhancement of the local pixel block in the image data, L _out1 (x, y) represents pixel values of a local pixel block in the contrast-enhanced image data,Qis a contrast gain parameter;

setting a brightness threshold in the image data;

setting up

Wherein L is _avg1 Representing a luminance average of a local pixel block in the image data;

when the brightness in the image is lower than the set threshold value, 0 is satisfied<Q<1, when the brightness in the image is higher than or equal to the set threshold value, the following conditions are satisfiedQ>1；

Setting a sliding window and carrying out contrast enhancement on pixels based on the average value and variance of brightness in the window;

setting an n multiplied by n sliding window to slide on the image, and calculating the average value and variance of brightness in the window to carry out contrast enhancement on pixels;

wherein L is _avg2 Representing the luminance average value, L, of a local pixel block within a sliding window _in2 (x, y) represents the pixel value before contrast enhancement of the local pixel block in the sliding window, L _out2 Representing pixel values of local pixel blocks in the sliding window after contrast enhancement;

the mean and variance of the brightness in the sliding window are respectively:

the enhancement coefficient η is:

where k is a set constant and η is an enhancement coefficient.

Preferably, the inputting the preprocessed image data into the neural network, and performing feature extraction on the real-time acquired image data based on the neural network includes:

s41, dividing an input image into L small-area image data blocks;

s42, after receiving an input image data block, the convolution layer moves on the input image data block through a convolution kernel according to a set step length, and performs multiply accumulation on a corresponding area of each step and a characteristic value of the area, so that characteristic extraction of each image data block is realized;

the convolution calculation formula is as follows:

wherein,representing input features->The weights of the corresponding convolution kernels are represented,bthe value of the offset is indicated and,frepresenting an output characteristic;

s43, in the neural network, the output of the upper layer is used as the input of the lower layer, and the convolutional neural network is formed by continuously stacking;

the data must be subjected to the process of activating the function during the process of inputting the data to the lower layer;

setting the upper layer output as the input of the lower layer with the input value of(λ=1, 2, … …, L), each output value +.>The corresponding input weight is +.>B is an offset, and the output result obtained after the input values are input into the neural network is:

wherein,y is an output result for the corresponding activation function;

s44, convolving the characteristics of the image data blocks in the continuously stacked convolution and pooling processes;

s45, transmitting the characteristics of the convolved image data block into a full connection layer;

s46, unfolding and combining the feature data through the full connection layer to obtain a feature array, and storing the feature array.

Preferably, the comparing the feature data in the real-time collected image data with the feature data in each classified image data in the sample set, and the identifying the real-time collected image data based on the comparison result includes:

s51, comparing based on the photographed multi-frame images;

transmitting the shot images to a neural network for feature extraction, identifying objects, storing feature points, and obtaining the coordinate positions of target objects in each frame of images; extracting the characteristics of another frame of image, obtaining a new coordinate position, comparing the new coordinate position with the characteristic point of the previous frame, and judging whether the image is the same object according to the fact that the contact ratio is higher than 80%;

s52, calculating the similarity between the detection target and the target to be detected after the feature extraction;

judging the similarity between the detection target and the target to be detected by calculating the distance between the features;

cosine distance:

wherein,for vector->Sum vector->Cosine similarity between D _cos Is the cosine distance, (x) _j ,y _j ) The coordinates of the J-th point are represented, and J represents the number of coordinate points;

inquiring h targets to be detected in d detection targets, and setting the characteristic form of each target as a ϴ -dimensional vector;

forming a detection feature matrix G by using feature vectors of all detection targets, forming a feature matrix H to be detected by using the feature vectors of all targets to be detected, and multiplying the detection feature matrix G by the feature matrix H to be detected to obtain a cosine similarity matrix C;

for an object to be detectedThe detection targets are arranged in a descending order according to the cosine similarity between the target features to be detected and each detection target feature, and c detection targets with top rank are reserved; setting z _α Feature similarity, t, representing rank alpha _β Representing the beta detection target;

setting an upper limit of a similarity threshold and a lower limit of the similarity threshold which are successfully matched;

feature similarity z when first ranked ₁ Less than the lower limit of the similarity threshold, failing in matching, and setting a target to be detectedIs a new generation target;

feature similarity z when first ranked ₁ Is larger than the upper limit of the similarity threshold, and the successful matching indicates the detection target t ₁ The similarity with the target to be detected is high enough, and the target to be detected is directly judged as a detection target t ₁ ；

When z ₁ The method comprises the steps of being smaller than the upper limit of a similarity threshold and larger than the lower limit of the similarity threshold, matching successfully, counting detection targets with similarity above the lower limit of the similarity threshold in ranking results, setting the detection targets above the lower limit of the similarity threshold as effective matching, calculating the proportion of the category of the detection targets in the effective matching, and judging the target to be detected as the detection target with the highest proportion.

Preferably, the setting the evaluation index, and adjusting the visual recognition accuracy of the unmanned aerial vehicle driving based on the evaluation index includes:

calculating the accuracy of visual recognition under the current resolution based on the photographed multi-frame images;

wherein TP represents the number of identified matches with the actual result; FP represents identifying a number inconsistent with the actual result; FN represents the number of missed samples; p is the accuracy rate, R is the recall rate;

further, for a single category, setting a change relation curve of the precision rate and the recall rate as a P-R curve; the area surrounded by the curve and the horizontal axis is the average accuracy AP, and the calculation formula is as follows:

setting an average accuracy average mAP as an average value of accuracy APs of a plurality of categories, wherein a calculation formula is as follows:

wherein m is the number of target types, mAP is the average value of the accuracy AP of a plurality of types;

setting an average accuracy average threshold value of visual identification; and when the calculated average accuracy average value is lower than a set average accuracy average value threshold value of visual identification, improving the resolution of the image shot by the unmanned aerial vehicle.

The embodiment also discloses a visual identification system based on unmanned aerial vehicle driving, specifically includes: the device comprises an image acquisition module, an image processing module, a display module and an image recognition module;

the image acquisition module is used for shooting image data in the current planning area in real time and transmitting the image data to the image processing module in real time;

the image processing module is used for processing the image data transmitted by the image acquisition module and transmitting the processed image to the image recognition module;

the image recognition module is used for recognizing the processed image data;

the display module is used for displaying in the display module according to the identification result of the image processing module.

Compared with the prior art, the invention provides a visual recognition system based on unmanned aerial vehicle driving, which has the following beneficial effects:

1. according to the invention, the sample set is established by collecting sample image data, and the identification of the sample set is completed in a mode of extracting characteristics through a neural network, so that the identification efficiency of the image data is improved.

2. According to the method, the unmanned aerial vehicle identification area is set, the image data acquired in real time in the identification area is preprocessed, the influence of noise points in the image on the image identification efficiency is reduced, the difference between the object and the background is highlighted through the contrast enhancement mode, the image identification efficiency is improved, and meanwhile the accuracy of image identification is guaranteed.

3. According to the invention, whether the image data collected in real time is matched with the image data in the sample set or not is judged by carrying out feature extraction on the image data collected in real time and comparing the image data with the feature data in the sample set extracted by the neural network, so that the accuracy of image identification is ensured.

4. According to the invention, through a multi-frame shooting mode, the feature extraction is carried out on a plurality of pieces of continuously shot image data, and a feature point comparison mode is carried out to judge whether the images are the same identification target, so that the accuracy and the reliability of identification are improved.

5. The invention comprehensively judges whether the detection target is matched with the target to be detected or not by judging the similarity between the detection target and the target to be detected, thereby improving the image recognition efficiency.

Drawings

Fig. 1 is a schematic diagram of a visual identification flow of unmanned aerial vehicle driving.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

s11, classifying the image data in the sample set according to the content;

s13, storing the feature data of the extracted classified image data;

preprocessing image data acquired by the unmanned aerial vehicle in real time comprises the following steps:

s31, filtering processing is carried out on image data acquired in real time;

wherein e is naturalA constant;I _p represented as p= (x) in image I ₁ ,y ₁ ) Gray value of dot, (x) ₁ ,y ₁ ) The coordinates of the pixel point are represented,I _q represented as q= (u) in image I ₁ ,v ₁ ) Gray value of the dot; σs represents the standard deviation of the spatial distance in the gaussian function, σr represents the standard deviation of the pixel value in the gaussian function; g _s Representing the spatial distance weight, G _r Representing pixel value weights;BF _p representing the gray value, W, of p-point in the filtered image _q A weight sum representing the pixel value of the q point;

s32, carrying out contrast enhancement on the filtered image data;

setting a brightness threshold in the image data;

setting up

the mean and variance of the brightness in the sliding window are respectively:

the enhancement coefficient η is:

where k is a set constant and η is an enhancement coefficient.

s41, dividing an input image into L small-area image data blocks;

the convolution calculation formula is as follows:

wherein,y is an output result for the corresponding activation function;

s46, unfolding and combining the characteristic data through the full connection layer to obtain a characteristic array, and storing the characteristic array;

s51, comparing based on the photographed multi-frame images;

transmitting the shot image to a neural network for feature extraction, identifying an object, storing feature points, and obtaining the coordinate position of a target object in a frame of image; extracting the characteristics of another frame of image, obtaining a new coordinate position, comparing the new coordinate position with the characteristic point of the previous frame, and judging whether the image is the same object according to the fact that the contact ratio is higher than 80%;

further, the similarity between the detection target and the target to be detected is judged by calculating the distance between the features;

cosine distance:

further, for inquiring h targets to be detected in d targets to be detected, setting the characteristic form of each target as a ϴ -dimensional vector;

further, for an object to be detectedThe detection targets are arranged in a descending order according to the cosine similarity between the target features to be detected and each detection target feature, and c detection targets with top rank are reserved; setting z _α Feature similarity, t, representing rank alpha _β Representing the beta detection target;

further, setting an upper limit of a similarity threshold and a lower limit of the similarity threshold which are successfully matched;

further, when the feature similarity z is ranked first ₁ Less than the lower limit of the similarity threshold, failing in matching, and setting a target to be detectedIs a new generation target;

When z ₁ The method comprises the steps of being smaller than the upper limit of a similarity threshold and larger than the lower limit of the similarity threshold, matching successfully, counting detection targets with similarity above the lower limit of the similarity threshold in ranking results, setting the detection targets above the lower limit of the similarity threshold as effective matching, calculating the proportion of the category of the detection targets in the effective matching, and judging the target to be detected as the detection target with the highest proportion;

wherein TP represents the number of identified matches with the actual result; FP represents identifying a number inconsistent with the actual result; FN represents the number of missed samples; p is the precision rate, R is the recall rate;

further, for a single category, setting a change relation curve of the precision rate and the recall rate as a P-R curve; the area surrounded by the curve and the horizontal axis is the average accuracy rate AP, and the calculation formula is as follows:

further, setting an average accuracy rate average mAP as an average value of accuracy rates AP of a plurality of categories, wherein a calculation formula is as follows:

wherein m is the number of target types, mAP is the average value of the accuracy rates AP of a plurality of types;

setting an average accuracy average threshold value of visual identification; when the calculated average accuracy average value is lower than a set average accuracy average value threshold value of visual identification, the resolution of the image shot by the unmanned aerial vehicle is improved;

the image recognition module is used for recognizing the processed image data;

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. The visual identification method based on unmanned aerial vehicle driving is characterized by comprising the following steps of:

the preprocessing of the image data acquired by the unmanned aerial vehicle in real time comprises the following steps:

s31, filtering processing is carried out on image data acquired in real time;

s32, carrying out contrast enhancement on the filtered image data;

the contrast enhancement of the filtered image data includes:

；

setting a brightness threshold in the image data;

setting up；

；

the mean and variance of the brightness in the sliding window are respectively:

；

the enhancement coefficient η is:

；

where k is a set constant and η is an enhancement coefficient.

2. The unmanned aerial vehicle driving-based visual recognition method of claim 1, wherein the establishing a sample set based on the collected sample image data and inputting a neural network based on the sample set for feature extraction comprises:

s11, classifying the image data in the sample set according to the content;

and S13, storing the feature data of the extracted classified image data.

3. The unmanned aerial vehicle driving-based visual recognition method of claim 1, wherein the filtering the image data acquired in real time comprises:

；

wherein e is a natural constant;I _p representation ofIs p= (x) in image I ₁ ,y ₁ ) Gray value of dot, (x) ₁ ,y ₁ ) The coordinates of the pixel point are represented,I _q represented as q= (u) in image I ₁ ,v ₁ ) Gray value of the dot; sigma (sigma) _s Representing the standard deviation, sigma, of the spatial distance in the gaussian function _r Representing the standard deviation of pixel values in the gaussian function; g _s Representing the spatial distance weight, G _r Representing pixel value weights;BF _p representing the gray value, W, of p-point in the filtered image _q The sum of weights representing the pixel values of the q points.

4. The unmanned aerial vehicle driving-based visual recognition method of claim 1, wherein inputting the preprocessed image data into the neural network and performing feature extraction on the real-time collected image data based on the neural network comprises:

s41, dividing an input image into L small-area image data blocks;

the convolution calculation formula is as follows:

；

wherein,y is an output result for the corresponding activation function;

5. The visual recognition method based on unmanned aerial vehicle driving according to claim 1, wherein the comparing the feature data in the real-time collected image data with the feature data in each classified image data in the sample set, and the recognizing the real-time collected image data based on the comparison result comprises:

s51, comparing based on the photographed multi-frame images;

the detection target is image data in a sample set;

cosine distance:

；

for one ofTarget to be detectedThe detection targets are arranged in a descending order according to the cosine similarity between the target features to be detected and each detection target feature, and c detection targets with top rank are reserved; setting z _α Feature similarity, t, representing rank alpha _β Representing the beta detection target;

6. The method for visual recognition based on unmanned aerial vehicle driving according to claim 1, wherein the setting the evaluation index, and adjusting the visual recognition accuracy of unmanned aerial vehicle driving based on the evaluation index comprises:

；

for a single category, setting a change relation curve of the accuracy rate and the recall rate as a P-R curve; the area surrounded by the curve and the horizontal axis is the average accuracy AP, and the calculation formula is as follows:

；

7. An unmanned aerial vehicle driving-based visual recognition system for implementing the unmanned aerial vehicle driving-based visual recognition method of any one of claims 1 to 6, comprising an image acquisition module, an image processing module, a display module, and an image recognition module;

the image recognition module is used for recognizing the processed image data;