CN110992378B - Dynamic updating vision tracking aerial photographing method and system based on rotor flying robot - Google Patents

Dynamic updating vision tracking aerial photographing method and system based on rotor flying robot Download PDF

Info

Publication number
CN110992378B
CN110992378B CN201911220924.1A CN201911220924A CN110992378B CN 110992378 B CN110992378 B CN 110992378B CN 201911220924 A CN201911220924 A CN 201911220924A CN 110992378 B CN110992378 B CN 110992378B
Authority
CN
China
Prior art keywords
target
frame
image
convolution
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911220924.1A
Other languages
Chinese (zh)
Other versions
CN110992378A (en
Inventor
谭建豪
谭姗姗
殷旺
刘力铭
王耀南
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN201911220924.1A priority Critical patent/CN110992378B/en
Publication of CN110992378A publication Critical patent/CN110992378A/en
Application granted granted Critical
Publication of CN110992378B publication Critical patent/CN110992378B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention belongs to the technical field of unmanned aerial vehicles, and discloses a dynamic updating visual tracking aerial photographing method and a system based on a rotor flying robot, wherein an HOG+SVM is used for detecting a target in a picture; then improving AlexNet network structure by designing three important influencing factors of twin network receptive field size, network total step length and characteristic filling, adding a smoothing matrix and a background suppression matrix, and effectively utilizing the characteristics of the previous frames; and fusing multiple layers of characteristic elements to learn target appearance change and background suppression on line, and training by using a continuous video sequence. The invention ensures the balance of precision and real-time tracking by utilizing the dynamic twin network, quickly learns the appearance change of the target by utilizing the dynamic updating network, fully utilizes the space-time information of the target, and effectively solves the problems of drift, target shielding and the like. According to the invention, a deeper network is selected to acquire target characteristics, and appearance learning and background suppression are used for dynamic tracking, so that the robustness is effectively increased.

Description

Dynamic updating vision tracking aerial photographing method and system based on rotor flying robot
Technical Field
The invention belongs to the technical field of unmanned aerial vehicles, in particular to a method for controlling the unmanned aerial vehicle
Relates to a dynamic updating visual tracking aerial photographing method and system based on a rotor flying robot.
Background
Currently, the closest prior art: unmanned aerial vehicles (Unmanned Aerial Vehicle, UAV) are unmanned aerial vehicles operated by radio remote control devices or programmed control means, capable of autonomously completing flight tasks without human intervention. In military, due to the characteristics of small size, strong maneuverability, easy control and the like of the rotor flying robot, the rotor flying robot can operate in an extreme environment, and is widely applied to anti-terrorism and explosion prevention, traffic monitoring and earthquake relief. In the civil field, unmanned aerial vehicle can be used to fields such as high altitude shooting, pedestrian detection. Rotorcraft robots typically need to track a specific target for flight and transmit information of the target to a ground station in real time while performing a specific task. Accordingly, tracking flights of vision-based rotorcraft robots are gaining widespread attention and are a current research focus.
The tracking flight of the rotor flying robot refers to that a camera is carried on the rotor flying robot flying in low altitude, an image frame sequence of a ground moving target is obtained in real time, the image coordinates of the target are calculated and used as the input of visual servo control, the speed required by the aircraft is obtained, and then the position and the gesture of the rotor flying robot are automatically controlled, so that the tracked ground moving target is maintained near the center of the visual field of the camera. The traditional twin network tracking method has good real-time performance, but when the influence of complex background or illumination is added after the target is lost due to target shielding, the situation that the target cannot be tracked correctly still occurs by taking the first frame as a standard reference. The method aims at the situations that the target is lost due to the influence of shielding, appearance change of the target, tracker drift, background factor interference and the like in the aerial photographing process of the rotor flying robot.
In summary, the problems of the prior art are: (1) The existing rotor flying robot is easy to cause drifting, target losing and other conditions due to the influence of shielding, illumination, background factor interference and the like in the aerial photographing process.
(2) In the prior art, the tracker extracts features basically using an AlexNet network, and deeper features about the target can be extracted by using a deeper CIResNet network, so that the tracker locks the target in a search area and reduces the influence of complex backgrounds.
(3) Although the existing twin network tracker operates at a high frame rate, there is no updated part in its frame, meaning that the tracker cannot quickly cope with severe changes in the target or background, which may cause tracking drift in some cases.
The difficulty of solving the technical problems is as follows: the method of identifying the location of the target in the search area using color features and contour features may fail when the appearance of the target changes drastically during tracking.
The operation time is increased if every frame is re-detected or a threshold is used to determine whether the tracking is lost during the tracking process.
More feature information can be obtained using a ciranet network for feature extraction, but the tracker frame rate is slightly reduced due to the deeper ciranet network compared to the AlexNet network.
Meaning of solving the technical problems: the tracking precision can be improved by using deeper network extraction features, and the overall performance of the tracker can be improved.
The dynamic updating part increases the robustness of the tracker, and the tracker does not learn the characteristic information of the first frame any more, but continuously learns the tracking result of the previous frame, so that the tracker adapts to the change of the target.
The CIResNet network can effectively extract more sample features, the tracker can learn more feature information of the target, and the capability of adapting to complex backgrounds is improved.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a dynamic updating visual tracking aerial photographing method and system based on a rotor flying robot.
The invention discloses a dynamic updating visual tracking aerial photographing method based on a rotor flying robot, which comprises the following steps of:
firstly, performing target detection on an input image by using an HOG feature extraction algorithm and a Support Vector Machine (SVM) algorithm;
and step two, transmitting target frame information obtained by target detection to a visual tracking part, and tracking the target in real time by adopting a dynamic update twin network based on a CIResNet network.
Further, in the first step, the target detection method includes:
(1) Dividing the image into a plurality of connected areas which are 8×8 pixel cell units;
(2) Collecting gradient amplitude and gradient direction of each pixel point in a cell unit, dividing the gradient direction of [ -90 degrees, 90 degrees ] into 9 sections (bins) on average, and using the gradient amplitude as a weight;
(3) Carrying out histogram statistics on the gradient amplitude of each pixel in the unit in each direction bin interval to obtain a one-dimensional gradient direction histogram;
(4) Performing contrast normalization on the histogram on the space block;
(5) Extracting HOG descriptors through a detection window, and combining HOG descriptors of all blocks in the detection window to form a final feature vector;
(6) Inputting the feature vector into a linear SVM, and performing target detection by using an SVM classifier;
(7) Dividing the detection window into overlapped blocks, calculating HOG descriptors for the blocks, and putting the formed feature vectors into a linear SVM to perform target/non-target classification;
(8) Scanning all positions and scales of the whole image by a detection window, and performing non-maximum suppression on an output pyramid to detect a target;
the method for carrying out contrast normalization on the histogram in the step (4) comprises the following steps:
the density of each histogram in this bin is first calculated and then the individual cell units in the bin are normalized according to this density.
Further, in the first step, the HOG feature extraction method specifically includes:
(1) normalizing the whole image, and normalizing the color space of the input image by adopting a Gamma correction method; the Gamma correction formula is as follows:
f(I)=I γ
wherein, I is the image pixel value, and Gamma is the Gamma correction coefficient;
(2) calculating gradients in the horizontal coordinate and the vertical coordinate directions of the image, and calculating a gradient direction value of each pixel position according to the gradients; the deriving operation captures the outline and some texture information, and further weakens the influence of illumination;
G x (x,y)=H(x+1,y)-H(x-1,y);
G y (x,y)=H(x,y+1)-H(x,y-1);
wherein Gx (x, y), gy (x, y) respectively represent the horizontal gradient and the vertical gradient at the pixel points (x, y) in the input image;
Figure BDA0002300823490000041
Figure BDA0002300823490000042
wherein G (x, y), H (x, y), alpha (x, y) respectively represent the gradient amplitude, the pixel value and the gradient direction of the pixel point at (x, y);
(3) and (3) calculating a histogram: dividing the image into small cell units, providing a code for the local image region;
(4) combining the cell units into a large block, normalizing the gradient histogram within the block;
(5) and collecting HOG characteristics of all overlapped blocks in the detection window, and combining the HOG characteristics into a final characteristic vector for classification.
Further, the step two of tracking the target in real time includes:
(1) Acquiring a first frame from a video sequence as template frame O 1 Acquiring search area Z using current frame t F is obtained through CIResNet-16 network respectively l (O 1) and fl (Z t );
(2) The network adds a transform matrix V and a transform matrix W, both of which can be rapidly calculated in the frequency domain by FFT. The transformation matrix V is obtained by the tracking result of the t-1 frame and the target of the first frame, acts on the convolution characteristic of the target template, learns the change of the target to enable the convolution characteristic of the template at the t moment to be approximately equal to the template convolution characteristic at the t-1 moment, and enables the change of the current frame relative to the previous frames to be smooth;
the transformation matrix W is obtained from the tracking result of the t-1 frame and acts on the convolution characteristics of the candidate region at the t moment, and background suppression is learned to eliminate the influence caused by irrelevant background characteristics in the target region;
training with regular linear regression for transformation matrix V and transformation matrix W, f l (O 1) and fl (Z t ) Through transforming the matrix to obtain respectively
Figure BDA0002300823490000043
and />
Figure BDA0002300823490000044
Wherein "×" represents a cyclic convolution operation, +.>
Figure BDA0002300823490000045
Representing the change of the appearance form of the target to obtain a target template after the current update, and ++>
Figure BDA0002300823490000046
Representing background suppression transformation to obtain a search template more suitable for the current; the final model is as follows:
Figure BDA0002300823490000051
adding a smooth matrix V and a background suppression W into the final model on the basis of a twin network, wherein the smooth matrix V learns the appearance change of the previous frame; the background suppression matrix W eliminates clutter influencing factors in the background.
Further, in the second step, the dynamic update twin network based on the ciranet includes:
after clipping operation, 7X 7 convolution is carried out to delete the characteristic affected by filling;
(II) entering an improved network CIResNet unit after passing through a maximum pooling layer with a stride of 2, wherein the CIR unit stage network is 3 layers in total, the first layer is 1 multiplied by 1 convolution, and the channel number is 64; the second layer is a 3×3 convolution with a channel number of 64; the third layer is a 1×1 convolution, and the number of channels is 256; adding the characteristic graphs after passing through the convolution layer, then entering a crop operation, wherein the crop operation is 3×3 convolution, and counteracting the characteristic of the influence of padding being 1;
(III) entering a CIR-D unit, wherein the CIR-D unit stage network is 12 layers in total, and the first layer, the second layer and the third layer are used as unit blocks for 4 times of circulation; the first layer is a 1×1 convolution with a channel number of 128; the second layer is a 3×3 convolution with a channel number of 128; the third layer is a 1×1 convolution, and the number of channels is 512;
and (IV) cross-correlation operation: the improved twin network structure takes an image pair as input, and comprises an example image Z and a candidate search image X; image Z represents the object of interest, while X represents the search area in the subsequent video frame, typically larger; both inputs are processed by ConvNet with parameter θ; two feature maps are generated, and the cross-correlation is:
Figure BDA0002300823490000052
b represents a deviation term, and the formula searches the image X by taking Z as a mode so as to enable the maximum value in the response chart f to be matched with the target position; the network is trained offline by means of a random image pair (Z, X) obtained from the training video and the corresponding ground tag y, the parameter θ in ConvNet being obtained by minimizing the following loss parameters in the training set:
Figure BDA0002300823490000053
the basic formula of the loss function is:
l(y,v)=log(1+exp(-yv));
wherein y ε (+1, -1) represents the true value and v represents the actual score of the sample search image; from the sigmoid function, the probability that the above expression represents a positive sample is
Figure BDA0002300823490000061
The probability of negative sample is +.>
Figure BDA0002300823490000062
The following is readily derived from the formula of cross entropy:
Figure BDA0002300823490000063
further, in the step (iii), the first block of the CIR-D unit stage is downsampled by the proposed CIR-D unit, and the number of filters is doubled after downsampling the feature map size; CIR-D changes the convolution steps in the bottleneck layer and the shortcut connection layer from 2 to 1, and inserts cutting again after the adding operation so as to delete the characteristics affected by filling; finally, performing spatial downsampling of the feature map with maximum pooling; the spatial size of the output feature map is 7 x 7, each feature receiving information from an area on the input image plane of size 77 x 77 pixels; adding the characteristic diagram after passing through the convolution layer, and then entering a loop operation and a maximum pooling layer; the key idea of these modifications is to ensure that only functions affected by padding are deleted, while the inherent block structure remains unchanged.
Further, in the step two, a dynamic update twin network based on CIResNet is adopted to track the target in real time, the dynamic update algorithm comprises:
(1) Inputting a picture to obtain a template image O1;
(2) Determining a candidate frame searching region Zt in a frame to be tracked;
(3) Mapping the original image to a specific feature space through feature mapping to obtain f respectively l (O 1) and fl (Z t ) These two depth features;
(4) Learning the change of the previous frame tracking result and the first frame template frame according to the RLR;
Figure BDA0002300823490000064
the fast computation in the frequency domain can be obtained:
Figure BDA0002300823490000065
thereby obtaining the variation
Figure BDA0002300823490000066
The following is indicated:
Figure BDA0002300823490000067
/>
in this context,
Figure BDA0002300823490000071
wherein O represents the target, f represents the matrix, the upper right symbol represents the first channel, and the lower right symbol represents the frame, namely, the tracking result of the previous frame and the target of the first frame are obtained;
(5) Obtaining the suppression quantity of the current frame background according to the RLR calculation formula in the frequency domain
Figure BDA0002300823490000072
Figure BDA0002300823490000073
wherein ,Gt-1 Is a map of the same size as the search area of the previous frame,
Figure BDA0002300823490000074
is to G t-1 Multiplying the center point of the picture by a Gaussian smoothing; target variation by on-line learning>
Figure BDA0002300823490000075
And background suppression transform->
Figure BDA0002300823490000076
(6) Element multi-layer feature fusion;
Figure BDA0002300823490000077
(7) Joint training is performed by first propagating forward for a given N-frame video sequence { I } t T=1,.. N is tracked to obtain N response graphs by { S }, and t |t=1,.. t T=1,..n } represents N target boxes;
Figure BDA0002300823490000078
(8) Gradient propagation and parameter updating using BPTT and SGD to obtain L t All parameters; from the following components
Figure BDA0002300823490000079
Calculating
Figure BDA00023008234900000710
and />
Figure BDA00023008234900000711
Through the CirConv and RLR layers on the left, ensuring efficient propagation of loss gradients to f l
Figure BDA00023008234900000712
Figure BDA00023008234900000713
Figure BDA00023008234900000714
Figure BDA00023008234900000715
Figure BDA00023008234900000716
wherein ,
Figure BDA00023008234900000717
representing f, E after Fourier transformation is a discrete Fourier transformation matrix, which is converted into ++for the multi-feature fusion formula>
Figure BDA00023008234900000718
The invention further aims to provide a dynamic updating visual tracking aerial photographing system based on the rotor flying robot, which implements the dynamic updating visual tracking aerial photographing method based on the rotor flying robot.
The invention further aims to provide an information data processing terminal for realizing the dynamic updating visual tracking aerial photographing method based on the rotor flying robot.
It is another object of the present invention to provide a computer readable storage medium comprising instructions which, when run on a computer, cause the computer to perform the dynamic update vision tracking aerial method based on a rotorcraft robot.
In summary, the invention has the advantages and positive effects that: (1) By adopting a deeper CIResNet network, a classification standard is automatically established by a sample learning method, the adaptability of a complex background is enhanced, and the effective extraction of more sample features is satisfied.
(2) The invention adds the smooth transformation matrix V in the traditional twin network, can learn the target appearance change of the previous frames on line, effectively utilizes the space-time information, and simultaneously adds the background suppression matrix W, thereby effectively controlling the influence of the background clutter factors.
(3) Instead of a single first frame as a standard reference, the problems of shielding and the like can be effectively solved by using appearance learning and background suppression to carry out dynamic tracking.
(4) The accuracy and the overlapping rate are both increased, and the speed can reach 16fps, so that the real-time requirement is basically met.
Table 1: tracking the comparison of various indexes
Tracking device Accuracy of Overlap ratio Speed (fps)
Ours 0.5512 0.2905 16.
SiamFC 0.5355 0.2889 65
DSiam 0.5414 0.2804 25
DSST 0.5078 0.1678 134
The algorithm is realized and debugged in the ubuntu16.04 operating system, and the computer hardware is configured as Intel core i7-8700k, main frequency 3.7GHz,GeForce RTX2080TI graphic card.
According to the dynamic updating visual tracking aerial photographing method based on the rotor flying robot, the CIResNet network is used for replacing the original AlexNet network, so that the network hierarchy is deeper compared, and the characteristic acquisition of a target is facilitated. Compared with the traditional twin network, the method adds the smooth transformation matrix V to learn the target appearance change of the previous frames on line, effectively utilizes the space-time information, and simultaneously adds the background suppression matrix W to effectively control the influence of the background clutter factors. The method provided by the invention does not singly take the first frame as a standard reference, but selects a deeper network to acquire the target characteristics, and uses the appearance learning and the background suppression to dynamically track, thereby effectively increasing the robustness.
Drawings
Fig. 1 is a flowchart of a dynamic update visual tracking aerial method based on a rotorcraft robot according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a dynamic update visual tracking aerial method based on a rotorcraft robot according to an embodiment of the present invention.
Fig. 3 is a frame diagram of a detection section provided in an embodiment of the present invention.
Fig. 4 is a frame diagram of a tracking section provided by an embodiment of the present invention.
Fig. 5 is a schematic diagram of a basic description of a ciranet network according to an embodiment of the present invention.
Fig. 6 is a schematic diagram of a single-layer network structure according to an embodiment of the present invention.
Fig. 7 is a graph of results on a UAV dataset provided by an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the following examples in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The tracking flight of the rotor flying robot refers to that a camera is carried on the rotor flying robot flying in low altitude, an image frame sequence of a ground moving target is obtained in real time, the image coordinates of the target are calculated and used as the input of visual servo control, the speed required by the aircraft is obtained, and then the position and the gesture of the rotor flying robot are automatically controlled, so that the tracked ground moving target is maintained near the center of the visual field of the camera. The traditional twin network tracking method has good real-time performance, but when the influence of complex background or illumination is added after the target is lost due to target shielding, the situation that the target cannot be tracked correctly still occurs by taking the first frame as a standard reference.
Aiming at the problems in the prior art, the invention provides a dynamic updating visual tracking aerial photography method based on a rotor flying robot, which uses a CIResNet network to replace the original AlexNet network, and compared with the CIResNet network, the CIResNet network has deeper network hierarchy and is beneficial to the feature acquisition of a target. Compared with the traditional twin network, the method adds the smooth transformation matrix V to learn the target appearance change of the previous frames on line, effectively utilizes the space-time information, and simultaneously adds the background suppression matrix W to effectively control the influence of the background clutter factors. The method provided by the invention does not singly take the first frame as a standard reference, but selects a deeper network to acquire the target characteristics, and uses the appearance learning and the background suppression to dynamically track, thereby effectively increasing the robustness. The present invention will be described in detail with reference to the accompanying drawings.
As shown in fig. 1, the method for dynamically updating vision tracking aerial photography based on the rotor flying robot provided by the embodiment of the invention comprises the following steps:
s101: and utilizing HOG (Histogram of Oriented Gradient) characteristics and a Support Vector Machine (SVM) algorithm to perform target detection on the input image.
Even if the gradient and edge position information corresponding to the object in the image is unknown, its appearance and shape are still described using the distribution of local gradients or edge directions. The HOG feature is used as a basis for constructing feature description by calculating and counting gradient direction histograms of target areas, and the principle can keep good invariance on geometric changes and optical deformation of images.
Firstly, dividing an image into a plurality of connected areas, namely cells (8×8 pixels), namely cell units, then collecting gradient amplitude values and directions of pixel points in the cell units, dividing the gradient directions of [ -90 degrees, 90 degrees ] into 9 intervals (bins) on average, and then carrying out histogram statistics on the gradient amplitude values of each pixel in the cell in each direction bin interval to obtain a one-dimensional gradient direction histogram. In order to promote invariance of features to illumination and shadows, it is necessary to contrast normalize the histograms, typically by contrast normalizing the histograms over a larger range. First we calculate the density of each histogram in this bin and then normalize the individual cell units in the bin according to this density, where the normalized block descriptor is called the HOG descriptor.
Combining HOG descriptors of all blocks in the detection window to form a final feature vector, and then using an SVM classifier to perform target detection. FIG. 3 depicts a feature extraction and object detection flow, where the detection window is divided into overlapping blocks, HOG descriptors are computed for these blocks, and the resulting feature vectors are placed in a linear SVM for object/non-object classification. The detection window scans all positions and scales of the whole image, and performs non-maximum suppression on the output pyramid to detect the target.
S102: and transmitting the target frame information obtained by target detection to a visual tracking part, and tracking the target in real time by adopting a CIResNet-based dynamic update twin network, wherein a tracking framework is shown in fig. 4.
Acquiring a first frame from a video sequence as template frame O 1 Acquiring search area Z using current frame t F is obtained through CIResNet-16 network respectively l (O 1) and fl (Z t )。
The end result of a conventional twin network is represented as follows:
Figure BDA0002300823490000111
the result of this formula calculation is a similarity, where corr represents the correlation filtering, can be replaced by other metric functions, t representing time, l represents layer i.
Unlike conventional Siamese networks, the proposed network adds two change matrices, the first transform matrix V acts on the convolution characteristics of the target template in order to make the convolution characteristics of the template at time t approximately equal to the template convolution characteristics at time t-1, this transform matrix is learned from the t-1 frame and is considered to be a smooth deformation of the target. The second transformation matrix W acts on the convolution characteristics of the candidate region at time t in order to emphasize that the target region eliminates irrelevant background features.
For transformation matrices V and W, the invention trains using canonical linear regression, f l (O 1) and fl (Z t ) Through transforming the matrix to obtain respectively
Figure BDA0002300823490000112
and />
Figure BDA0002300823490000113
Wherein "×" represents a cyclic convolution operation, +.>
Figure BDA0002300823490000114
Representing the change of the appearance form of the target->
Figure BDA0002300823490000115
Representing a background rejection transformation. The final model is as follows:
Figure BDA0002300823490000116
the model is added with two transformation matrixes of smoothing and background suppression on the basis of a twin network, and the smoothing matrix learns the appearance change of the previous frame and can effectively utilize space-time information; the background inhibition matrix eliminates clutter influencing factors in the background, and robustness is enhanced. Meanwhile, the AlexNet network in the traditional twin network is replaced by the CIResNet-16 network, so that the precision is higher.
Fig. 2 is a schematic diagram of a dynamic update visual tracking aerial method based on a rotorcraft robot according to an embodiment of the present invention.
The detailed description of HOG feature extraction in step S101 is:
1) To reduce the effect of illumination factors, the entire image first needs to be normalized (normalized). In the texture intensity of the image, since the specific gravity of the local surface exposure contribution is large, the compression processing can effectively reduce the local shadow and illumination variation of the image. Typically, the image is converted into a gray scale, where the color space of the input image is normalized (or normalized) using Gamma correction. The Gamma correction is understood to be an improvement of the image contrast effect of dark or bright parts in an image, and can effectively reduce the local shadow and illumination variation of the image, and the Gamma correction formula is as follows:
f(I)=I γ (3)
wherein I is the image pixel value, and Gamma is the Gamma correction coefficient.
2) Calculating gradients in the horizontal coordinate and the vertical coordinate directions of the image, and calculating a gradient direction value of each pixel position according to the gradients; the deriving operation can capture the outline and some texture information, so that the influence of illumination can be further weakened;
G x (x,y)=H(x+1,y)-H(x-1,y) (4)
G y (x,y)=H(x,y+1)-H(x,y-1) (5)
in the above expression, gx (x, y), gy (x, y) respectively represent a horizontal gradient and a vertical gradient at a pixel point (x, y) in an input image.
Figure BDA0002300823490000121
Figure BDA0002300823490000122
G (x, y), H (x, y), α (x, y) represent the gradient magnitude, pixel value and gradient direction of the pixel point at (x, y), respectively.
3) And (3) calculating a histogram: the image is divided into small cell units (which may be rectangular or circular) in order to provide a code for the local image area.
4) The cell units are combined into large blocks (blocks) with the gradient histograms normalized inside the blocks.
5) All overlapping blocks in the detection window are collected for HOG features and combined into a final feature vector for classification.
The detailed description of the modified network CIResNet-16 in step S102 is:
CIResNet-16 is divided into three phases (stride of 8) consisting of 18 weighted convolution layers.
(1) After a clipping operation (size 2) 7 x 7 convolutions are entered to remove the features affected by the padding.
(2) After the maximum pooling layer with the stride of 2 is passed, the improved network CIResNet unit is entered, the CIR unit is 3 layers in the network at this stage as shown in (a) of fig. 5, the first layer is 1×1 convolution, and the channel number is 64; the second layer is a 3×3 convolution with a channel number of 64; the third layer was a 1 x 1 convolution with 256 channels. As depicted in fig. 5, the feature map after passing through the convolution layer is subjected to an addition operation, and then enters a drop operation, which is a 3×3 convolution, so as to offset the feature that the padding is an influence of 1.
(3) The network enters a CIR-D (Downsampling CIR) unit, the CIR-D unit is shown in (b) of fig. 5, and the network is 12 layers in total and takes the first layer, the second layer and the third layer as unit blocks to circulate for 4 times. Wherein the first layer is a 1 x 1 convolution with a channel number of 128; the second layer is a 3×3 convolution with a channel number of 128; the third layer is a 1 x 1 convolution with a channel number of 512.
The first block at this stage (4 blocks in total) is downsampled by the proposed CIR-D unit, and after downsampling the feature map size, the number of filters is doubled to improve feature resolvability. CIR-D changes the stride of the convolution in the bottleneck layer and the shortcut layer from 2 to 1, and inserts a cut again after the addition operation to delete the feature affected by the filling. Finally, a maximum pooling is employed to perform spatial downsampling of the feature map. The spatial size of the output feature map is 7 x 7, each feature receiving information from an area on the input image plane of size 77 x 77 pixels. As shown in fig. 5, the feature map after passing through the convolution layer is subjected to an addition operation, and then enters the crop operation and the maximum pooling layer. The key idea of these modifications is to ensure that only functions affected by padding are deleted, while the inherent block structure remains unchanged.
(4) Cross-correlation operation:
the improved twin network structure takes an image pair as input, and comprises an example image Z and a candidate search image X. Image Z represents an object of interest (e.g., an image block centered on the target object in a first video frame), while X represents a search area in a subsequent video frame, typically larger. Both inputs are processed by ConvNet with parameter θ. This will produce two feature maps that are cross-correlated:
Figure BDA0002300823490000141
where b represents the deviation term, the whole formula corresponds to an exhaustive search of the image X in Z mode, with the aim of matching the maximum value in the response map f with the target position. To achieve this goal, the network is trained offline by means of a pair of random images (Z, X) obtained from training videos and corresponding ground tags y, the parameter θ in ConvNet being obtained by minimizing the following loss parameters in the training set:
Figure BDA0002300823490000142
the basic formula of the loss function is:
l(y,v)=log(1+exp(-yv)) (10)
where y ε (+1, -1) represents the true value and v represents the actual score of the sample search image. From the sigmoid function, the probability that the above expression represents a positive sample is
Figure BDA0002300823490000143
The probability of negative sample is +.>
Figure BDA0002300823490000144
The following is readily derived from the formula of cross entropy:
Figure BDA0002300823490000145
the step of the dynamic update algorithm in step S102 is:
(1) Inputting a picture to obtain a template image O1;
(2) Determining a candidate frame searching region Zt in a frame to be tracked;
(3) Mapping the original image to a specific feature space through feature mapping to obtain f respectively l (O 1) and fl (Z t ) These two depth features;
(4) Learning a previous frame tracking result and a change of a first frame template frame according to Regularized Linear Regression (RLR);
Figure BDA0002300823490000146
the fast computation in the frequency domain can be obtained:
Figure BDA0002300823490000151
thereby obtaining the variation
Figure BDA0002300823490000152
The following is indicated:
Figure BDA0002300823490000153
in this context,
Figure BDA0002300823490000154
wherein O represents the object, f represents the matrix, the upper right represents the first channel, and the lower right represents the frame, that is, the tracking result of the previous frame and the object of the first frame are obtained.
(5) Obtaining the suppression quantity of the current frame background according to the RLR calculation formula in the frequency domain
Figure BDA0002300823490000155
Figure BDA0002300823490000156
wherein ,Gt-1 Is a map of the same size as the search area of the previous frame,
Figure BDA0002300823490000157
is to G t-1 The center point of the picture is multiplied by a gaussian smoothing, the purpose of which is to emphasize the center and suppress edges. Target variation by on-line learning>
Figure BDA0002300823490000158
And background suppression transform->
Figure BDA0002300823490000159
The improved model can improve tracking precision and real-time speed by starting the adaptive capacity of the static twin network on line.
(6) Element multi-layer feature fusion;
Figure BDA00023008234900001510
the center weight of the shallow layer features is high, the peripheral weight of the deep layer features is high, the center is low, if the target is in the center of the search area, the shallow layer features can better position the target, and if the target is in the periphery of the search area, the deep layer features can also effectively determine the position of the target.
That is, when the target is close to the center of the search area, the deeper layer features help to eliminate background interference, and the shallower layer features help to obtain accurate positioning of the target; if the target is located at the periphery of the search area, only deeper layer features can effectively determine the target location.
(7) Joint training is performed by first propagating forward for a given N-frame video sequence { I } t T=1,.. N is tracked to obtain N response graphs by { S }, and t |t=1,.. t T=1,..n } represents N target boxes;
Figure BDA0002300823490000161
(8) A schematic diagram of the single layer network structure is shown in fig. 6. Wherein "Eltwise" (elementwise multi-layer fusion) is training a matrixThe values in the matrix represent weights for different locations of different feature maps. Gradient propagation and parameter updates were performed using BPTT (backpropagation through time) and SGD (Stochastic Gradient Descent). In order to effectively use BPTT and random gradient (SGD) trained networks, L must be obtained t All parameters, as shown in FIG. 6, are composed of
Figure BDA0002300823490000162
Calculate->
Figure BDA0002300823490000163
and />
Figure BDA0002300823490000164
Then passes through the "CirConv" and "RLR" layers on the left to ensure that the loss gradient can propagate efficiently to f l
Figure BDA0002300823490000165
Figure BDA0002300823490000166
Figure BDA0002300823490000167
Figure BDA0002300823490000168
Figure BDA0002300823490000169
wherein ,
Figure BDA00023008234900001610
representing f after Fourier transform, E is a discrete Fourier transform matrix for cell-based multipleLayer fusion, may also be calculated using the procedure described above. For the multi-feature fusion formula, it can be converted into +.>
Figure BDA00023008234900001611
The model has reliable online adaptability, effectively learns foreground and background changes and suppresses background interference, does not damage real-time response capability, and has excellent balance tracking performance in experiments. In addition, the model is directly used for joint training on the marked video sequence as a whole, rather than training on the image pair, so that the abundant space-time information of the moving object can be better captured. Meanwhile, the model uses joint training, wherein all parameters can be subjected to offline learning through back propagation, and data training is facilitated. The specific effect is shown in fig. 7.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When used in whole or in part, is implemented in the form of a computer program product comprising one or more computer instructions. When loaded or executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL), or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims (8)

1. The dynamic updating vision tracking aerial photographing method based on the rotor flying robot is characterized by comprising the following steps of:
firstly, performing target detection on an input image by using an HOG feature extraction algorithm and a Support Vector Machine (SVM) algorithm;
step two, transmitting target frame information obtained by target detection to a visual tracking part, and tracking the target in real time by adopting a dynamic update twin network based on a CIResNet network;
the step two of real-time tracking the target comprises the following steps:
(1) Acquiring a first frame from a video sequence as template frame O 1 Acquiring search area Z using current frame t F is obtained through CIResNet-16 network respectively l (O 1) and fl (Z t );
(2) The network is added with a transformation matrix V and a transformation matrix W, the two matrices are rapidly calculated through FFT in a frequency domain, the transformation matrix V is obtained by a tracking result of a t-1 frame and a first frame target, the transformation matrix V acts on the convolution characteristic of a target template, and the change of the learning target enables the convolution characteristic of the template at the t moment to be approximately equal to the template convolution characteristic at the t-1 moment, so that the change of a current frame relative to the previous frames is smoothed; the transformation matrix W is obtained from the tracking result of the t-1 frame, acts on the convolution characteristic of the candidate region at the t moment, and learns the influence caused by irrelevant background characteristics in the background inhibition elimination target region;
training with regular linear regression for transformation matrix V and transformation matrix W, f l (O 1) and fl (Z t) Through transforming the matrix to obtain respectively
Figure QLYQS_1
and />
Figure QLYQS_2
Wherein "×" represents a cyclic convolution operation, +.>
Figure QLYQS_3
Representing the change of the appearance form of the target to obtain a target template after the current update, and ++>
Figure QLYQS_4
Representing background suppression transformation to obtain a search template more suitable for the current; the final model is as follows:
Figure QLYQS_5
adding a smooth matrix V and a background suppression W into the final model on the basis of a twin network, wherein the smooth matrix V learns the appearance change of the previous frame; the background inhibition matrix W eliminates clutter influencing factors in the background;
in the process of tracking the target in real time by adopting a CIResNet-based dynamic update twin network, a dynamic update algorithm comprises:
(1) Inputting a picture to obtain a template image O1;
(2) Determining a candidate frame searching region Zt in a frame to be tracked;
(3) Mapping the original image to a specific feature space through feature mapping to obtain f respectively l (O 1) and fl (Z t ) These two depth features;
(4) Learning the change of the previous frame tracking result and the first frame template frame according to the RLR;
Figure QLYQS_6
fast calculation in the frequency domain:
Figure QLYQS_7
thereby obtaining the variation
Figure QLYQS_8
The following is indicated:
Figure QLYQS_9
wherein ,f1 l =f l (O 1 ),
Figure QLYQS_10
Wherein O represents the target, f represents the matrix, the upper right symbol represents the first channel, and the lower right symbol represents the frame, namely, the tracking result of the previous frame and the target of the first frame are obtained;
(5) Obtaining the suppression quantity of the current frame background according to the RLR calculation formula in the frequency domain
Figure QLYQS_11
Figure QLYQS_12
wherein ,Gt-1 Is a map of the same size as the search area of the previous frame,
Figure QLYQS_13
is to G t-1 Multiplying the center point of the picture by a Gaussian smoothing; target variation by on-line learning>
Figure QLYQS_14
And background suppression transform->
Figure QLYQS_15
(6) Element multi-layer feature fusion;
Figure QLYQS_16
(7) Joint training is performed by first propagating forward for a given N-frame video sequence { I } t T=1,.. N is tracked to obtain N response graphs by { S }, and t |t=1,.. t T=1,..n } represents N target boxes;
Figure QLYQS_17
(8) Gradient propagation and parameter updating using BPTT and SGD to obtain L t All parameters; from the following components
Figure QLYQS_18
Calculating
Figure QLYQS_19
and />
Figure QLYQS_20
Through the CirConv and RLR layers on the left, ensuring efficient propagation of loss gradients to f l
Figure QLYQS_21
Figure QLYQS_22
Figure QLYQS_23
Figure QLYQS_24
Figure QLYQS_25
wherein ,
Figure QLYQS_26
representing f, E after Fourier transform is a discrete Fourier transform matrix, which is converted into for a multi-feature fusion formula
Figure QLYQS_27
2. The method for dynamically updating visual tracking and aerial photographing based on a rotorcraft robot according to claim 1, wherein in the first step, the target detection method comprises the following steps:
(1) Dividing the image into a plurality of connected areas which are 8×8 pixel cell units;
(2) Collecting gradient amplitude and gradient direction of each pixel point in a cell unit, dividing the gradient direction of [ -90 degrees, 90 degrees ] into 9 bin intervals on average, and using the gradient amplitude as a weight;
(3) Carrying out histogram statistics on the gradient amplitude of each pixel in the unit in each direction bin interval to obtain a one-dimensional gradient direction histogram;
(4) Performing contrast normalization on the histogram on the space block;
(5) Extracting HOG descriptors through a detection window, and combining HOG descriptors of all blocks in the detection window to form a final feature vector;
(6) Inputting the feature vector into a linear SVM, and performing target detection by using an SVM classifier;
(7) Dividing the detection window into overlapped blocks, calculating HOG descriptors for the blocks, and putting the formed feature vectors into a linear SVM to perform target/non-target classification;
(8) Scanning all positions and scales of the whole image by a detection window, and performing non-maximum suppression on an output pyramid to detect a target;
the method for carrying out contrast normalization on the histogram in the step (4) comprises the following steps:
the density of each histogram in this bin is first calculated and then the individual cell units in the bin are normalized according to this density.
3. The method for dynamically updating visual tracking aerial photographs based on a rotorcraft robot of claim 1, wherein in step one, the HOG feature extraction algorithm specifically comprises:
(1) normalizing the whole image, and normalizing the color space of the input image by adopting a Gamma correction method; the Gamma correction formula is as follows:
f(I)=I γ
wherein, I is the image pixel value, and Gamma is the Gamma correction coefficient;
(2) calculating gradients in the horizontal coordinate and the vertical coordinate directions of the image, and calculating a gradient direction value of each pixel position according to the gradients; the deriving operation captures the outline and some texture information, and further weakens the influence of illumination;
G x (x,y)=H(x+1,y)-H(x-1,y);
G y (x,y)=H(x,y+1)-H(x,y-1);
wherein Gx (x, y), gy (x, y) respectively represent the horizontal gradient and the vertical gradient at the pixel points (x, y) in the input image;
Figure QLYQS_28
Figure QLYQS_29
wherein G (x, y), H (x, y), alpha (x, y) respectively represent the gradient amplitude, the pixel value and the gradient direction of the pixel point at (x, y);
(3) and (3) calculating a histogram: dividing the image into small cell units, providing a code for the local image region;
(4) combining the cell units into a large block, normalizing the gradient histogram within the block;
(5) and collecting HOG characteristics of all overlapped blocks in the detection window, and combining the HOG characteristics into a final characteristic vector for classification.
4. The method for dynamically updating visual tracking aerial photograph based on a rotorcraft robot of claim 1, wherein in step two, the dynamically updating twin network based on ciranet comprises:
after clipping operation, 7X 7 convolution is carried out to delete the characteristic affected by filling;
(II) entering an improved network CIResNet unit after passing through a maximum pooling layer with a stride of 2, wherein the CIR unit stage network is 3 layers in total, the first layer is 1 multiplied by 1 convolution, and the channel number is 64; the second layer is a 3×3 convolution with a channel number of 64; the third layer is a 1×1 convolution, and the number of channels is 256; adding the characteristic graphs after passing through the convolution layer, then entering a crop operation, wherein the crop operation is 3×3 convolution, and counteracting the characteristic of the influence of padding being 1;
(III) entering a CIR-D unit, wherein the CIR-D unit stage network is 12 layers in total, and the first layer, the second layer and the third layer are used as unit blocks for 4 times of circulation; the first layer is a 1×1 convolution with a channel number of 128; the second layer is a 3×3 convolution with a channel number of 128; the third layer is a 1×1 convolution, and the number of channels is 512;
and (IV) cross-correlation operation: the improved twin network structure takes an image pair as input, and comprises an example image Z and a candidate search image X; image Z represents the object of interest, while X represents the search area in the subsequent video frame, typically larger; both inputs are processed by ConvNet with parameter θ; two feature maps are generated, and the cross-correlation is:
Figure QLYQS_30
b represents a deviation term, and the formula searches the image X by taking Z as a mode so as to enable the maximum value in the response chart f to be matched with the target position; the network is trained offline by means of a random image pair (Z, X) obtained from the training video and the corresponding ground tag y, the parameter θ in ConvNet being obtained by minimizing the following loss parameters in the training set:
Figure QLYQS_31
the basic formula of the loss function is:
l(y,v)=log(1+exp(-yv));
wherein y ε (+1, -1) represents the true value and v represents the actual score of the sample search image; from the sigmoid function, the probability that the above expression represents a positive sample is
Figure QLYQS_32
The probability of negative sample is +.>
Figure QLYQS_33
The following is readily derived from the formula of cross entropy:
Figure QLYQS_34
5. the method of dynamically updating visual tracking on-the-fly based on a rotorcraft robot of claim 4, wherein in step (iii), the first block of the CIR-D cell stage is downsampled by the proposed CIR-D cell, and the number of filters is doubled after downsampling the feature map size; CIR-D changes the convolution steps in the bottleneck layer and the shortcut connection layer from 2 to 1, and inserts cutting again after the adding operation so as to delete the characteristics affected by filling; finally, performing spatial downsampling of the feature map with maximum pooling; the spatial size of the output feature map is 7 x 7, each feature receiving information from an area on the input image plane of size 77 x 77 pixels; adding the characteristic diagram after passing through the convolution layer, and then entering a loop operation and a maximum pooling layer; the key idea of these modifications is to ensure that only functions affected by padding are deleted, while the inherent block structure remains unchanged.
6. A rotorcraft robot-based dynamic update vision tracking aerial system that implements the rotorcraft robot-based dynamic update vision tracking aerial method of claim 1.
7. An information data processing terminal for implementing the dynamic update visual tracking aerial method based on a rotorcraft robot according to any one of claims 1 to 5.
8. A computer readable storage medium comprising instructions that when run on a computer cause the computer to perform the dynamically updated visual tracking aerial method based on a rotorcraft robot as claimed in any one of claims 1 to 5.
CN201911220924.1A 2019-12-03 2019-12-03 Dynamic updating vision tracking aerial photographing method and system based on rotor flying robot Active CN110992378B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911220924.1A CN110992378B (en) 2019-12-03 2019-12-03 Dynamic updating vision tracking aerial photographing method and system based on rotor flying robot

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911220924.1A CN110992378B (en) 2019-12-03 2019-12-03 Dynamic updating vision tracking aerial photographing method and system based on rotor flying robot

Publications (2)

Publication Number Publication Date
CN110992378A CN110992378A (en) 2020-04-10
CN110992378B true CN110992378B (en) 2023-05-16

Family

ID=70089566

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911220924.1A Active CN110992378B (en) 2019-12-03 2019-12-03 Dynamic updating vision tracking aerial photographing method and system based on rotor flying robot

Country Status (1)

Country Link
CN (1) CN110992378B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113610888B (en) * 2021-06-29 2023-11-24 南京信息工程大学 Twin network target tracking method based on Gaussian smoothing
CN114863267B (en) * 2022-03-30 2023-05-23 南京邮电大学 Precise statistical method for number of aerial trees based on multi-track intelligent prediction
CN115984333B (en) * 2023-02-14 2024-01-19 北京拙河科技有限公司 Smooth tracking method and device for airplane target
CN116088580B (en) * 2023-02-15 2023-11-07 北京拙河科技有限公司 Flying object tracking method and device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110490906A (en) * 2019-08-20 2019-11-22 南京邮电大学 A kind of real-time vision method for tracking target based on twin convolutional network and shot and long term memory network

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR3076377B1 (en) * 2017-12-29 2021-09-24 Bull Sas PREDICTION OF DISPLACEMENT AND TOPOLOGY FOR A NETWORK OF CAMERAS.
CN108898620B (en) * 2018-06-14 2021-06-18 厦门大学 Target tracking method based on multiple twin neural networks and regional neural network
CN109272530B (en) * 2018-08-08 2020-07-21 北京航空航天大学 Target tracking method and device for space-based monitoring scene
CN109993774B (en) * 2019-03-29 2020-12-11 大连理工大学 Online video target tracking method based on depth cross similarity matching
CN110070562A (en) * 2019-04-02 2019-07-30 西北工业大学 A kind of context-sensitive depth targets tracking
CN110443827B (en) * 2019-07-22 2022-12-20 浙江大学 Unmanned aerial vehicle video single-target long-term tracking method based on improved twin network

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110490906A (en) * 2019-08-20 2019-11-22 南京邮电大学 A kind of real-time vision method for tracking target based on twin convolutional network and shot and long term memory network

Also Published As

Publication number Publication date
CN110992378A (en) 2020-04-10

Similar Documents

Publication Publication Date Title
CN110992378B (en) Dynamic updating vision tracking aerial photographing method and system based on rotor flying robot
WO2021043168A1 (en) Person re-identification network training method and person re-identification method and apparatus
CN110222787B (en) Multi-scale target detection method and device, computer equipment and storage medium
CN106960446B (en) Unmanned ship application-oriented water surface target detection and tracking integrated method
CN111460968B (en) Unmanned aerial vehicle identification and tracking method and device based on video
CN109903331B (en) Convolutional neural network target detection method based on RGB-D camera
CN109685045B (en) Moving target video tracking method and system
WO2019007253A1 (en) Image recognition method, apparatus and device, and readable medium
CN102156995A (en) Video movement foreground dividing method in moving camera
CN109389609B (en) Interactive self-feedback infrared target detection method based on FART neural network
CN112364865B (en) Method for detecting small moving target in complex scene
CN111160365A (en) Unmanned aerial vehicle target tracking method based on combination of detector and tracker
CN109101926A (en) Aerial target detection method based on convolutional neural networks
Hu et al. An infrared target intrusion detection method based on feature fusion and enhancement
Zou et al. Microarray camera image segmentation with Faster-RCNN
CN108345835B (en) Target identification method based on compound eye imitation perception
CN111860488A (en) Method, device, equipment and medium for detecting and identifying bird nest of tower
CN109635649B (en) High-speed detection method and system for unmanned aerial vehicle reconnaissance target
CN104715476A (en) Salient object detection method based on histogram power function fitting
CN113763417B (en) Target tracking method based on twin network and residual error structure
Guangjing et al. Research on static image recognition of sports based on machine learning
Liu et al. [Retracted] Mean Shift Fusion Color Histogram Algorithm for Nonrigid Complex Target Tracking in Sports Video
CN112069997B (en) Unmanned aerial vehicle autonomous landing target extraction method and device based on DenseHR-Net
Songtao et al. Saliency detection of infrared image based on region covariance and global feature
CN111027427B (en) Target gate detection method for small unmanned aerial vehicle racing match

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant