CN109191491B - Target tracking method and system of full convolution twin network based on multi-layer feature fusion - Google Patents

Target tracking method and system of full convolution twin network based on multi-layer feature fusion Download PDF

Info

Publication number
CN109191491B
CN109191491B CN201810878152.XA CN201810878152A CN109191491B CN 109191491 B CN109191491 B CN 109191491B CN 201810878152 A CN201810878152 A CN 201810878152A CN 109191491 B CN109191491 B CN 109191491B
Authority
CN
China
Prior art keywords
image
target
frame
score
map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201810878152.XA
Other languages
Chinese (zh)
Other versions
CN109191491A (en
Inventor
邹腊梅
陈婷
李鹏
张松伟
李长峰
熊紫华
李晓光
杨卫东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201810878152.XA priority Critical patent/CN109191491B/en
Publication of CN109191491A publication Critical patent/CN109191491A/en
Application granted granted Critical
Publication of CN109191491B publication Critical patent/CN109191491B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/223Analysis of motion using block-matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a target tracking method and a target tracking system of a convolution twin network based on multilayer feature fusion, wherein the method comprises the following steps: according to the target position and size of the image, cutting out target template images and search area images of all images in an image sequence training set, wherein an image pair formed by the target template images and the search area images forms a training data set; constructing a convolution twin network based on multilayer feature fusion; training the multilayer feature fusion-based convolution twin network based on a training data set to obtain a well-trained multilayer feature fusion-based convolution twin network; and (4) performing target tracking by using a trained convolution twin network based on multi-layer feature fusion. In the process of tracking the target, the scoring graphs of different layers are fused, the interference of similar or similar targets is better distinguished by combining high-layer semantic features and bottom-layer detail features, and the problems of target drift and target loss in the tracking process are prevented.

Description

Target tracking method and system of full convolution twin network based on multi-layer feature fusion
Technical Field
The invention belongs to the crossing field of digital image processing, deep learning and pattern recognition, and particularly relates to a target tracking method and a target tracking system of a convolution twin network based on multi-layer feature fusion.
Background
Target tracking has a very important position in computer vision, however, due to the complexity of natural scenes, the sensitivity of the target to illumination changes, the requirements of tracking on real-time performance and robustness, and the existence of factors such as occlusion, posture and scale change, the tracking problem is still difficult. The traditional target tracking method cannot extract abundant characteristics from a target, so that the target and a background are strictly distinguished, a tracking drift phenomenon easily occurs, and the target cannot be tracked for a long time. With the rise of deep learning, a general convolutional neural network can effectively extract characteristics rich in targets, but network parameters are too many, and if online tracking is needed, the requirement of real-time performance cannot be met, and the practical engineering utilization value is limited.
Due to the improvement of hardware performance and the popularization of high-performance computing devices such as a GPU (graphics processing unit) and the like, the tracking instantaneity is not a problem which is difficult to overcome any more, and an effective target appearance model is of great importance in the tracking process. The essence of target tracking is a similarity measurement process, and due to the special structure of the twin convolution network, the target tracking has natural advantages in similarity measurement, and has a convolution structure, so that abundant features can be extracted for target tracking. The pure twin-convolution-based network adopts offline training and online tracking, although the requirement can be met on high-performance computing equipment in real time, the full-convolution twin network only utilizes semantic information extracted from the high layer of the convolution network in the tracking process, and the background similar to the target cannot be well distinguished in a complex scene, so that the problems of tracking drift and target loss are caused.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to solve the technical problems of tracking drift and target loss caused by similar background interference in the prior art.
In order to achieve the above object, in a first aspect, an embodiment of the present invention provides a target tracking method based on a multilayer feature fusion convolutional twin network, where the method includes the following steps:
(1) according to the target position and size of the image, cutting out target template images and search area images of all images in an image sequence training set, wherein an image pair formed by the target template images and the search area images forms a training data set;
(2) constructing a convolution twin network based on multilayer feature fusion, wherein the convolution twin network based on multilayer feature fusion comprises 2 identical first branch convolution networks and second branch convolution networks, the first branch convolution networks are used for obtaining a feature map of a search area image, the second branch convolution networks are used for obtaining a feature map of a target template image, the two branch networks are connected on a feature map of a designated layer, and cross-correlation operation is respectively carried out on the feature map of the target template image and corresponding layers of the feature map of the search area image to obtain corresponding score maps;
(3) training the multilayer feature fusion-based convolution twin network based on the training data set to obtain a well-trained multilayer feature fusion-based convolution twin network;
(4) and calculating a score map of an image in the image sequence to be detected by using the trained convolution twin network based on the multi-layer feature fusion, and tracking the target based on the score map.
Specifically, the step (1) comprises the following steps: the target template image cutting method comprises the following steps: the method comprises the steps of taking a target area as a center of a target rectangular frame, taking the center position of the target area as a target position, respectively expanding p pixels on four sides of the target rectangular frame, filling an exceeding part with image mean pixels if the rectangular frame exceeds an image boundary, and finally scaling the size of a cut target image block to 127 multiplied by 127; the cutting method of the search area image comprises the following steps: respectively expanding 2p pixels on four sides of a target rectangular frame by taking a target area as a center, filling an excess part with image mean pixels if the rectangular frame exceeds an image boundary, and finally scaling the size of a clipped search area image block to 255 multiplied by 255; where p is (w + h)/4, w is the target rectangular frame width pixel, and h is the target rectangular frame length pixel.
Specifically, the step (2) includes: inputting the search area image into a first branch convolution network, and obtaining a first-layer feature map SFM through Conv11Then, a second layer characteristic diagram SFM is obtained through the Pool1 and Conv2 layers2Finally, obtaining the third layer of characteristics through Pool2, Conv3, Conv4 and Conv5Diagram SFM3(ii) a Inputting the target template image into a second branch convolution network, and obtaining a first-layer feature map GFM (Gaussian filtered model) through Conv11Then, a second layer characteristic diagram GFM is obtained through Pool1 and Conv22Finally, obtaining a third-layer characteristic diagram GFM through Pool2, Conv3, Conv4 and Conv53(ii) a Respectively carrying out cross-correlation operation on the corresponding layers of the target template characteristic diagram and the search area image characteristic diagram to obtain three corresponding score graphs SM1、SM2、SM3The formula is as follows:
SMi=GFMi*SFMi
wherein, i is 1,2 and 3 respectively, and is cross-correlation operation.
Specifically, the joint loss function L (y, v) constructed in step (3) is calculated as follows:
L(y,v)=α1L1(y,v1)+α2L2(y,v2)+α3L3(y,v3)
Figure BDA0001753807590000031
l(y[u],vi[u])=log(1+exp(y[u]×vi[u]))
Figure BDA0001753807590000032
wherein L isiIs a score map SMiA loss function of l (y [ u ]],vi[u]) Is a score map SMiLogarithmic loss function of each point in αiIs a score map SMiWeight of (0 < α)1<α2<α3≤1,DiScore representation map SMiU is a point in the score plot, ciIs a score map SMiCentral point of (2), RiIs a score map SMiRadius of (a), kiIs a score map SMiStep length of (v)i[u]Is a score map SMiThe corresponding value of u in the table, | | | | represents the euclidean distance, i ═ 1,2, 3.
Specifically, the step (4) includes:
1) cutting out a target template image of a 1 st frame image according to the target position and size of the 1 st frame image of the image sequence to be detected, inputting the target template image of the 1 st frame image into a second branch convolution network of a trained multilayer feature fusion convolution twin network, and obtaining a feature map M of the target template image1,t=2;
2) Cutting out a search area image of a t-frame image according to the target position and the size of a t-1 frame image of an image sequence to be detected, inputting the search area image of the t-frame into a first branch convolution network of a trained multilayer feature fusion convolution twin network, and obtaining a search area image feature map of the t-frame image;
3) respectively carrying out cross-correlation operation on the target template characteristic diagram of the t-1 th frame and the corresponding layer of the search area image characteristic diagram of the t-th frame to obtain three score diagrams of a target in the search area image of the t-th frame, and then fusing the score diagrams in a linear weighting mode to obtain a final score diagram of the t-th frame;
4) calculating the target position of the target in the image of the t frame according to the final score map of the t frame;
5) cutting out a target template image of the t frame image according to the target position and size in the t frame image, inputting the target template image of the t frame image into a second branch convolution network of a trained multilayer feature fusion convolution twin network, and marking the obtained feature map of the target template image as MtIf the feature map of the target template image of the t-th frame is
Figure BDA0001753807590000041
η is a smoothing factor;
6) and t +1, repeating the steps 2) -5) until t is equal to N, and ending the target tracking of the image sequence to be detected, wherein N is the total frame number of the image sequence to be detected.
In order to achieve the above object, in a second aspect, an embodiment of the present invention provides a target tracking system based on a multilayer feature fusion convolution twin network, where the system includes:
the cutting module is used for cutting out target template images and search area images of all images in the image sequence training set according to the target positions and the sizes of the images, and the images formed by the target template images and the search area images form a training data set;
the multi-layer feature fusion-based convolution twin network module comprises 2 identical first branch convolution networks and second branch convolution networks, wherein the first branch convolution networks are used for obtaining a feature map of a search area image, the second branch convolution networks are used for obtaining a feature map of a target template image, the two branch networks are connected on a feature map of a designated layer, and cross-correlation operation is respectively carried out on the feature map of the target template image and corresponding layers of the feature map of the search area image to obtain corresponding score maps;
the training module is used for training the multilayer feature fusion-based convolution twin network based on the training data set to obtain a trained multilayer feature fusion-based convolution twin network;
and the target tracking module is used for calculating a score map of an image in the image sequence to be detected by using the trained convolution twin network based on the multilayer feature fusion and tracking the target based on the score map.
Specifically, the method for cropping the target template image is characterized by comprising the following steps: the method comprises the steps of taking a target area as a center of a target rectangular frame, taking the center position of the target area as a target position, respectively expanding p pixels on four sides of the target rectangular frame, filling an exceeding part with image mean pixels if the rectangular frame exceeds an image boundary, and finally scaling the size of a cut target image block to 127 multiplied by 127; the cutting method of the search area image comprises the following steps: respectively expanding 2p pixels on four sides of a target rectangular frame by taking a target area as a center, filling an excess part with image mean pixels if the rectangular frame exceeds an image boundary, and finally scaling the size of a clipped search area image block to 255 multiplied by 255; where p is (w + h)/4, w is the target rectangular frame width pixel, and h is the target rectangular frame length pixel.
Specifically, the multilayer feature fusion based convolution twin network comprises: searchingInputting the cable region image into a first branch convolution network, and obtaining a first-layer feature map SFM through Conv11Then, a second layer characteristic diagram SFM is obtained through the Pool1 and Conv2 layers2Finally, obtaining a third-layer characteristic diagram SFM through Pool2, Conv3, Conv4 and Conv53(ii) a Inputting the target template image into a second branch convolution network, and obtaining a first-layer feature map GFM (Gaussian filtered model) through Conv11Then, a second layer characteristic diagram GFM is obtained through Pool1 and Conv22Finally, obtaining a third-layer characteristic diagram GFM through Pool2, Conv3, Conv4 and Conv53(ii) a Respectively carrying out cross-correlation operation on the corresponding layers of the target template characteristic diagram and the search area image characteristic diagram to obtain three corresponding score graphs SM1、SM2、SM3The formula is as follows:
SMi=GFMi*SFMi
wherein, i is 1,2 and 3 respectively, and is cross-correlation operation.
Specifically, the joint loss function L (y, v) constructed in the training module is calculated as follows:
L(y,v)=α1L1(y,v1)+α2L2(y,v2)+α3L3(y,v3)
Figure BDA0001753807590000061
l(y[u],vi[u])=log(1+exp(y[u]×vi[u]))
Figure BDA0001753807590000062
wherein L isiIs a score map SMiA loss function of l (y [ u ]],vi[u]) Is a score map SMiLogarithmic loss function of each point in αiIs a score map SMiWeight of (0 < α)1<α2<α3≤1,DiScore representation map SMiU is a point in the score plot, ciIs a score map SMiCentral point of (2), RiIs a score map SMiRadius of (a), kiIs a score map SMiStep length of (v)i[u]Is a score map SMiThe corresponding value of u in the table, | | | | represents the euclidean distance, i ═ 1,2, 3.
Specifically, the target tracking module performs target tracking through the following steps:
1) cutting out a target template image of a 1 st frame image according to the target position and size of the 1 st frame image of the image sequence to be detected, inputting the target template image of the 1 st frame image into a second branch convolution network of a trained multilayer feature fusion convolution twin network, and obtaining a feature map M of the target template image1,t=2;
2) Cutting out a search area image of a t-frame image according to the target position and the size of a t-1 frame image of an image sequence to be detected, inputting the search area image of the t-frame into a first branch convolution network of a trained multilayer feature fusion convolution twin network, and obtaining a search area image feature map of the t-frame image;
3) performing cross-correlation operation on the target template characteristic diagram of the t-1 th frame and the corresponding layer of the target search area image characteristic diagram of the t-1 th frame respectively to obtain three score maps of a target in the search area image of the t-1 th frame, and then fusing the score maps in a linear weighting manner to obtain a final score map of the t-1 th frame;
4) calculating the target position of the target in the image of the t frame according to the final score map of the t frame;
5) cutting out a target template image of the t frame image according to the target position and size in the t frame image, inputting the target template image of the t frame image into a second branch convolution network of a trained multilayer feature fusion convolution twin network, and marking the obtained feature map of the target template image as MtIf the feature map of the target template image of the t-th frame is
Figure BDA0001753807590000071
η is a smoothing factor;
6) and t +1, repeating the steps 2) -5) until t is equal to N, and ending the target tracking of the image sequence to be detected, wherein N is the total frame number of the image sequence to be detected.
Generally, compared with the prior art, the above technical solution conceived by the present invention has the following beneficial effects:
(1) in the process of tracking the target, the scoring graphs of different layers are fused, and the interference of similar or similar targets can be better distinguished by combining high-layer semantic features and bottom-layer detail features, so that the problems of target drift and target loss in the tracking process are prevented.
(2) The invention uses the fusion score maps obtained by the cross correlation of the multilayer characteristic maps to carry out supervision training and design a new combined loss function, and the design of the combined loss function considers the action sizes of different layer score maps to endow different weights, thereby preventing gradient dispersion and accelerating the convergence process.
Drawings
FIG. 1 is a flowchart of a target tracking method based on a multilayer feature fusion convolution twin network according to an embodiment of the present invention;
FIG. 2 is an exemplary diagram of a target template image and a search area image provided by an embodiment of the invention;
FIG. 3 is a schematic diagram of a convolutional twin network structure based on multi-layer feature fusion according to an embodiment of the present invention;
4(a), 4(b), and 4(c) are images of 36 th frame, 102 th frame, and 136 th frame of the first video sequence for target tracking by using the method of the present invention according to an embodiment of the present invention;
5(a), 5(b), and 5(c) are images of frame 14, frame 24, and frame 470, respectively, of a second video sequence for target tracking using the method of the present invention according to an embodiment of the present invention;
fig. 6(a), fig. 6(b), and fig. 6(c) are images of frame 39, frame 61, and frame 85, respectively, of performing target tracking on a third video sequence by using the method of the present invention according to an embodiment of the present invention;
FIG. 7(a), FIG. 7(b), and FIG. 7(c) are images of frame 23, frame 239, and frame 257, respectively, for performing target tracking on a fourth video sequence by using the method of the present invention according to an embodiment of the present invention;
8(a), 8(b), and 8(c) are images of 14 th frame, 52 th frame, and 98 th frame of a fifth video sequence for target tracking using the method of the present invention according to an embodiment of the present invention;
fig. 9(a), 9(b), and 9(c) are images of frame 23, frame 37, and frame 63, respectively, of a sixth video sequence for object tracking by using the method of the present invention according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Fig. 1 is a flowchart of a target tracking method based on a multilayer feature fusion convolution twin network according to an embodiment of the present invention. As shown in fig. 1, the method comprises the steps of:
(1) according to the target position and size of the image, target template images and search area images of all images in the image sequence training set are cut out, and an image pair formed by the target template images and the search area images forms a training data set.
The image sequence training set is an image pair consisting of an image and a label graph, and the label graph marks the target position and size of the corresponding image. And cutting out a target template image and a search area image which take the target area as the center from the image through the label graph. The training data set for this example is 4 ten thousand pairs of training images.
The target template image cutting method comprises the following steps: and a target rectangular frame taking the target area as the center, wherein the target area center position represents the target position. And respectively expanding p pixels on four sides of the target rectangular frame to obtain the size of the target template image block as (w +2p) × (h +2p), wherein p is (w + h)/4, w is a target rectangular frame width pixel, and h is a target rectangular frame length pixel. If the rectangular frame exceeds the image boundary, the exceeding part is filled with image mean pixels. Finally, the cropped target image block size is scaled to 127 × 127.
The cutting method of the search area image comprises the following steps: and (2) expanding 2p pixels on four sides of the target rectangular frame by taking the target area as the center to obtain the size of the image block of the search area, wherein the size is (w +4p) × (h +4p), and p is (w + h)/4. If the rectangular frame exceeds the image boundary, the exceeding part is filled with image mean pixels. Finally, the cropped search area image block size is scaled to 255 x 255.
Fig. 2 is an exemplary diagram of a target template image and a search area image according to an embodiment of the present invention. As shown in fig. 2, the 1 st line is a target template image, and the 2 nd line is a corresponding search area image.
(2) And constructing a convolution twin network based on multilayer feature fusion.
Fig. 3 is a schematic diagram of a convolutional twin network structure based on multi-layer feature fusion according to an embodiment of the present invention. As shown in fig. 3, the convolution twin network based on multi-layer feature fusion includes 2 identical first branch convolution networks and second branch convolution networks, the first branch convolution network is used for acquiring the feature map of the search area image, and the second branch convolution network is used for acquiring the feature map of the target template image.
The two branch networks have the same structure and parameters, and each branch network comprises a first convolutional layer Conv1, a first pooling layer Pool1, a second convolutional layer Conv2, a second pooling layer Pool2, a third convolutional layer Conv3, a fourth convolutional layer Conv4 and a fifth convolutional layer Conv5 which are connected in sequence. The specific parameters are as follows: the Conv1 convolution kernel size is 11 × 11, the step size is 2, and the number of channels is 48; the size of the Pool1 convolution kernel is 3 multiplied by 3, the step length is 2, and the number of channels is 48; the Conv2 convolution kernel size is 5 × 5, the step size is 1, and the number of channels is 128; the size of the Pool2 convolution kernel is 3 multiplied by 3, the step length is 1, and the number of channels is 128; conv3, Conv4 and Conv5 convolution kernels are all 3 × 3 in size, step size is 1, the number of Conv3 and Conv4 channels is 192, and the number of Conv5 channels is 128.
Inputting the search area image into a first branch convolution network, and obtaining a first-layer feature map SFM through Conv1 1123 × 123 × 48, and then obtaining a second layer characteristic diagram SFM through Pool1 and Conv2 layers2Size and diameter57 × 57 × 128, and finally obtaining a third-layer feature map SFM through Pool2, Conv3, Conv4 and Conv53And size 22 × 22 × 128.
Inputting the target template image into a second branch convolution network, and obtaining a first-layer feature map GFM (Gaussian filtered model) through Conv11Size 59 × 59 × 48, followed by Pool1, Conv2 to obtain a second layer profile GFM225 × 25 × 128, and finally obtaining a third layer characteristic diagram GFM through Pool2, Conv3, Conv4 and Conv53And size 6 × 6 × 128.
The two branch networks are connected on the feature map of the designated layer, and the feature map of the target template image and the corresponding layer of the feature map of the search area image are respectively subjected to cross-correlation operation to obtain corresponding score maps.
Respectively carrying out cross-correlation operation on the corresponding layers of the target template characteristic diagram and the search area image characteristic diagram to obtain three corresponding score graphs SM1、SM2、SM3The sizes are 65 × 65, 33 × 33 and 17 × 17 respectively, and the formula is as followsi=GFMi*SFMiWherein, i is 1,2 and 3 respectively, and is cross-correlation operation.
(3) And training the multilayer feature fusion-based convolution twin network based on the training data set to obtain the well-trained multilayer feature fusion-based convolution twin network.
And constructing a joint loss function. There is a real label y [ u ] ∈ D { +1, -1} for each point u ∈ D in the score-score graph, and since the target is at the center of the score-score graph, the center of the score-score graph is set as the center of the circle, and the element in the score-score graph is considered to belong to a positive sample within the radius R (considering the stride k of the network), and vice versa, the formula is as follows:
Figure BDA0001753807590000101
wherein: c is the central point of the score map, and | | represents the euclidean distance.
The loss function used in the training is based on a logarithmic loss function, taking the average of the losses for all points for the overall loss of a single score map. The joint loss function L (y, v) constructed by the invention is as follows:
L(y,v)=α1L1(y,v1)+α2L2(y,v2)+α3L3(y,v3)
Figure BDA0001753807590000111
l(y[u],vi[u])=log(1+exp(y[u]×vi[u]))
Figure BDA0001753807590000112
wherein L isiIs a score map SMiA loss function of l (y [ u ]],vi[u]) Is a score map SMiLogarithmic loss function of each point in αiIs a score map SMiWeight of (0 < α)1<α2<α3≤1,DiScore representation map SMiU is a point in the score plot, ciIs a score map SMiCentral point of (2), RiIs a score map SMiRadius of (a), kiIs a score map SMiStep length of (v)i[u]Is a score map SMiThe corresponding value of u in the table, | | | | represents the euclidean distance, i ═ 1,2, 3.
Specifically, α1、α2、α3The step k is taken as 0.3, 0.6 and 1 respectively, and for the score chart 1, the score chart 2 and the score chart 3, the corresponding values of the step k are respectively 2, 4 and 8.
And (4) minimizing a joint loss function into an objective function, and learning a network parameter W of the multilayer feature fused convolution twin network by adopting a back propagation algorithm.
This embodiment trains 40 times, 5000 times per iteration, using 8 pairs of training images per iteration. In the network training process, along with the convergence of network parameters, the learning rate in the random gradient descent method is set to be 10 in sequence-2Reduced to 10-5I.e. after each 10 training sessions, the learning rate of the gradient descent method decreases by a factor of 10.
(4) And calculating a score map of an image in the image sequence to be detected by using the trained convolution twin network based on the multi-layer feature fusion, and tracking the target based on the score map.
1) Cutting out a target template image of a 1 st frame image according to the target position and size of the 1 st frame image of the image sequence to be detected, inputting the target template image of the 1 st frame image into a second branch convolution network of a trained multilayer feature fusion convolution twin network, and obtaining a feature map M of the target template image1,t=2;
The target position and target size in the initial frame image of the image sequence to be measured are known. And cutting out a target template image of the 1 st frame image according to the target position and the size of the 1 st frame image of the image sequence to be detected.
2) Cutting out a search area image of a t-frame image according to the target position and the size of a t-1 frame image of an image sequence to be detected, inputting the search area image of the t-frame into a first branch convolution network of a trained multilayer feature fusion convolution twin network, and obtaining a search area image feature map of the t-frame image;
the target position and target size in the initial frame image of the image sequence to be measured are known. And cutting out a search area image of the 2 nd frame image according to the target position and the size of the 1 st frame image of the image sequence to be detected.
3) Respectively carrying out cross-correlation operation on the target template characteristic diagram of the t-1 th frame and the corresponding layer of the search area image characteristic diagram of the t-th frame to obtain three score diagrams of a target in the search area image of the t-th frame, and then fusing the score diagrams in a linear weighting mode to obtain a final score diagram of the t-th frame;
SM to size 17 × 173Bistric interpolated upsampling to a score map of size 65 × 65
Figure BDA0001753807590000126
SM of size 33 × 332Bistric interpolated upsampling to a score map of size 65 × 65
Figure BDA0001753807590000127
Calculating the final score map SM by adopting the following calculation formula123
Figure BDA0001753807590000121
Wherein the content of the first and second substances,
Figure BDA0001753807590000122
and
Figure BDA0001753807590000123
are respectively a score map SM2And SM3The score map obtained after upsampling is w taken in the embodiment1=21、w2=22、w3=23
4) Calculating the target position of the target in the image of the t frame according to the final score map of the t frame;
obtaining a final score map SM after the three score maps are superposed according to the weight123Then, SM123And performing bicubic interpolation to 255 × 255, and recording the position of the maximum score point in the score map as a position pt
In order to make the tracking process more continuous, a linear interpolation position p is adoptedtTo determine the target position of the target in the t-th frame image
Figure BDA0001753807590000124
The specific treatment method is as follows:
Figure BDA0001753807590000125
where γ is a smoothing factor.
This example γ is taken to be 0.35.
5) Cutting out a target template image of the t frame image according to the target position and size in the t frame image, inputting the target template image of the t frame image into a second branch convolution network of a trained multilayer feature fusion convolution twin network, and obtaining a feature map of the target template imageIs marked as MtIf the feature map of the target template image of the t-th frame is
Figure BDA0001753807590000131
η is a smoothing factor;
in this example η, 0.01 is used.
6) And t +1, repeating the steps 2) -5) until t is equal to N, and ending the target tracking of the image sequence to be detected, wherein N is the total frame number of the image sequence to be detected.
FIG. 4(a) is a 36 th frame of image for object tracking of a first video sequence using the method of the present invention according to an embodiment of the present invention; FIG. 4(b) is a 102 th frame of image for object tracking of a first video sequence using the method of the present invention according to an embodiment of the present invention; fig. 4(c) is a 136 th frame image for performing object tracking on a first video sequence by using the method of the present invention according to an embodiment of the present invention. Therefore, the target tracking method provided by the invention can effectively track the target with the rapid movement, posture change, shielding and similar background interference.
FIG. 5(a) is a 14 th frame of image for object tracking of a second video sequence using the method of the present invention according to an embodiment of the present invention; FIG. 5(b) is a 24 th frame image for object tracking of a second video sequence using the method of the present invention according to an embodiment of the present invention; fig. 5(c) is a 470 th frame image of the second video sequence for object tracking by using the method of the present invention according to the embodiment of the present invention. Therefore, the target tracking method provided by the invention can effectively track the target with posture change, shielding and similar background interference.
FIG. 6(a) is a 39 th frame of image for object tracking of a third video sequence using the method of the present invention according to an embodiment of the present invention; FIG. 6(b) is a 61 st frame of image for object tracking of a third video sequence using the method of the present invention according to an embodiment of the present invention; fig. 6(c) is an image of the 85 th frame of the third video sequence for object tracking by using the method of the present invention according to the embodiment of the present invention. Therefore, the target tracking method provided by the invention can effectively track the targets with posture change, shielding and motion blur.
FIG. 7(a) is a 23 rd frame image of a fourth video sequence for object tracking using the method of the present invention according to an embodiment of the present invention; FIG. 7(b) is a 239 th frame of image for target tracking of a fourth video sequence using the method of the present invention according to an embodiment of the present invention; fig. 7(c) is a 257 th frame image of a fourth video sequence for object tracking by using the method of the present invention according to an embodiment of the present invention. Therefore, the target tracking method provided by the invention can effectively track the target with illumination change and shielding.
FIG. 8(a) is a 14 th frame image of a fifth video sequence for object tracking using the method of the present invention according to an embodiment of the present invention; FIG. 8(b) is a 52 th frame image for object tracking of a fifth video sequence using the method of the present invention according to an embodiment of the present invention; fig. 8(c) is a 98 th frame image of a fifth video sequence for object tracking by using the method of the present invention according to an embodiment of the present invention. Therefore, the target tracking method provided by the invention can effectively track the target with attitude change and similar background interference.
FIG. 9(a) is a 23 rd frame image of a sixth video sequence for object tracking according to an embodiment of the present invention; FIG. 9(b) is a 37 th frame image of a sixth video sequence for object tracking according to an embodiment of the present invention; fig. 9(c) is a 63 rd frame image of the sixth video sequence for object tracking by using the method of the present invention according to the embodiment of the present invention. Therefore, the target tracking method provided by the invention can effectively track the target with illumination change.
The above description is only for the preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (8)

1. The target tracking method of the convolution twin network based on multilayer feature fusion is characterized by comprising the following steps of:
(1) according to the target position and size of the image, cutting out target template images and search area images of all images in an image sequence training set, wherein an image pair formed by the target template images and the search area images forms a training data set;
(2) constructing a convolution twin network based on multilayer feature fusion, wherein the convolution twin network based on multilayer feature fusion comprises 2 identical first branch convolution networks and second branch convolution networks, the first branch convolution networks are used for obtaining a feature map of a search area image, the second branch convolution networks are used for obtaining a feature map of a target template image, the two branch networks are connected on a feature map of a designated layer, and cross-correlation operation is respectively carried out on the feature map of the target template image and corresponding layers of the feature map of the search area image to obtain corresponding score maps;
(3) training the multilayer feature fusion-based convolution twin network based on the training data set to obtain a well-trained multilayer feature fusion-based convolution twin network;
(4) calculating a score map of an image in an image sequence to be detected by using a trained convolution twin network based on multi-layer feature fusion, and tracking a target based on the score map;
the step (2) comprises the following steps:
inputting the search area image into a first branch convolution network, and obtaining a first-layer feature map SFM through Conv11Then, a second layer characteristic diagram SFM is obtained through the Pool1 and Conv2 layers2Finally, obtaining a third-layer characteristic diagram SFM through Pool2, Conv3, Conv4 and Conv53
Inputting the target template image into a second branch convolution network, and obtaining a first-layer feature map GFM (Gaussian filtered model) through Conv11Then, a second layer characteristic diagram GFM is obtained through Pool1 and Conv22Finally, obtaining a third-layer characteristic diagram GFM through Pool2, Conv3, Conv4 and Conv53
Respectively carrying out cross-correlation operation on the corresponding layers of the target template characteristic diagram and the search area image characteristic diagram to obtain three corresponding score graphs SM1、SM2、SM3The formula is as follows:
SMi=GFMi*SFMi
wherein, i is 1,2 and 3 respectively, and is cross-correlation operation.
2. The target tracking method of claim 1, wherein step (1) comprises:
the target template image cutting method comprises the following steps: the method comprises the steps of taking a target area as a center of a target rectangular frame, taking the center position of the target area as a target position, respectively expanding p pixels on four sides of the target rectangular frame, filling an exceeding part with image mean pixels if the rectangular frame exceeds an image boundary, and finally scaling the size of a cut target image block to 127 multiplied by 127;
the cutting method of the search area image comprises the following steps: respectively expanding 2p pixels on four sides of a target rectangular frame by taking a target area as a center, filling an excess part with image mean pixels if the rectangular frame exceeds an image boundary, and finally scaling the size of a clipped search area image block to 255 multiplied by 255;
where p is (w + h)/4, w is the target rectangular frame width pixel, and h is the target rectangular frame length pixel.
3. The target tracking method of claim 1, wherein the joint loss function L (y, v) constructed in step (3) is calculated as follows:
L(y,v)=α1L1(y,v1)+α2L2(y,v2)+α3L3(y,v3)
Figure FDA0002455626530000031
l(y[u],vi[u])=log(1+exp(y[u]×vi[u]))
Figure FDA0002455626530000032
wherein L isiIs a score chart SMiOf the loss function y [ u ]]The true label representing point u in the score plot,
Figure FDA0002455626530000033
is a score map SMiLogarithmic loss function of each point in αiIs a score map SMiWeight of (0 < α)1<α2<α3≤1,DiScore representation map SMiU is a point in the score plot, ciIs a score map SMiCentral point of (2), RiIs a score map SMiRadius of (a), kiIs a score map SMiStep length of (v)i[u]Is a score map SMiThe corresponding value of u in the table, | | | | represents the euclidean distance, i ═ 1,2, 3.
4. The target tracking method of claim 1, wherein step (4) comprises:
1) cutting out a target template image of a 1 st frame image according to the target position and size of the 1 st frame image of the image sequence to be detected, inputting the target template image of the 1 st frame image into a second branch convolution network of a trained multilayer feature fusion convolution twin network, and obtaining a feature map M of the target template image1,t=2;
2) Cutting out a search area image of a t-frame image according to the target position and the size of a t-1 frame image of an image sequence to be detected, inputting the search area image of the t-frame into a first branch convolution network of a trained multilayer feature fusion convolution twin network, and obtaining a search area image feature map of the t-frame image;
3) respectively carrying out cross-correlation operation on the target template characteristic diagram of the t-1 th frame and the corresponding layer of the search area image characteristic diagram of the t-th frame to obtain three score diagrams of a target in the search area image of the t-th frame, and then fusing the score diagrams in a linear weighting mode to obtain a final score diagram of the t-th frame;
4) calculating the target position of the target in the image of the t frame according to the final score map of the t frame;
5) according to the t frame imageCutting out a target template image of the t frame image according to the target position and size in the image, inputting the target template image of the t frame image into a second branch convolution network of a trained multilayer feature fusion convolution twin network, and recording the obtained feature image of the target template image as MtIf the feature map of the target template image of the t-th frame is
Figure FDA0002455626530000041
η is a smoothing factor;
6) and t +1, repeating the steps 2) -5) until t is equal to N, and ending the target tracking of the image sequence to be detected, wherein N is the total frame number of the image sequence to be detected.
5. The target tracking system of the convolution twin network based on the multi-layer feature fusion is characterized by comprising the following components:
the cutting module is used for cutting out target template images and search area images of all images in the image sequence training set according to the target positions and the sizes of the images, and the images formed by the target template images and the search area images form a training data set;
the multi-layer feature fusion-based convolution twin network module comprises 2 identical first branch convolution networks and second branch convolution networks, wherein the first branch convolution networks are used for obtaining a feature map of a search area image, the second branch convolution networks are used for obtaining a feature map of a target template image, the two branch networks are connected on a feature map of a designated layer, and cross-correlation operation is respectively carried out on the feature map of the target template image and corresponding layers of the feature map of the search area image to obtain corresponding score maps;
the training module is used for training the multilayer feature fusion-based convolution twin network based on the training data set to obtain a trained multilayer feature fusion-based convolution twin network;
the target tracking module is used for calculating a score map of an image in an image sequence to be detected by using a trained convolution twin network based on multi-layer feature fusion, and tracking a target based on the score map;
the multilayer feature fusion based convolution twin network comprises:
inputting the search area image into a first branch convolution network, and obtaining a first-layer feature map SFM through Conv11Then, a second layer characteristic diagram SFM is obtained through the Pool1 and Conv2 layers2Finally, obtaining a third-layer characteristic diagram SFM through Pool2, Conv3, Conv4 and Conv53
Inputting the target template image into a second branch convolution network, and obtaining a first-layer feature map GFM (Gaussian filtered model) through Conv11Then, a second layer characteristic diagram GFM is obtained through Pool1 and Conv22Finally, obtaining a third-layer characteristic diagram GFM through Pool2, Conv3, Conv4 and Conv53
Respectively carrying out cross-correlation operation on the corresponding layers of the target template characteristic diagram and the search area image characteristic diagram to obtain three corresponding score graphs SM1、SM2、SM3The formula is as follows:
SMi=GFMi*SFMi
wherein, i is 1,2 and 3 respectively, and is cross-correlation operation.
6. The object tracking system of claim 5,
the target template image cutting method comprises the following steps: the method comprises the steps of taking a target area as a center of a target rectangular frame, taking the center position of the target area as a target position, respectively expanding p pixels on four sides of the target rectangular frame, filling an exceeding part with image mean pixels if the rectangular frame exceeds an image boundary, and finally scaling the size of a cut target image block to 127 multiplied by 127;
the cutting method of the search area image comprises the following steps: respectively expanding 2p pixels on four sides of a target rectangular frame by taking a target area as a center, filling an excess part with image mean pixels if the rectangular frame exceeds an image boundary, and finally scaling the size of a clipped search area image block to 255 multiplied by 255;
where p is (w + h)/4, w is the target rectangular frame width pixel, and h is the target rectangular frame length pixel.
7. The target tracking system of claim 5, wherein the joint loss function L (y, v) constructed in the training module is calculated as follows:
L(y,v)=α1L1(y,v1)+α2L2(y,v2)+α3L3(y,v3)
Figure FDA0002455626530000061
Figure FDA0002455626530000062
Figure FDA0002455626530000063
wherein L isiIs a score map SMiOf the loss function y [ u ]]The true label representing point u in the score plot,
Figure FDA0002455626530000064
is a score map SMiLogarithmic loss function of each point in αiIs a score map SMiWeight of (0 < α)1<α2<α3≤1,DiScore representation map SMiU is a point in the score plot, ciIs a score map SMiCentral point of (2), RiIs a score map SMiRadius of (a), kiIs a score map SMiStep length of (v)i[u]Is a score map SMiThe corresponding value of u in the table, | | | | represents the euclidean distance, i ═ 1,2, 3.
8. The target tracking system of claim 5, wherein the target tracking module performs target tracking by:
1) cutting out a 1 st frame image according to the target position and size of the 1 st frame image of the image sequence to be detectedInputting the target template image of the 1 st frame image into the second branch convolution network of the trained multilayer feature fusion convolution twin network to obtain the feature map M of the target template image1,t=2;
2) Cutting out a search area image of a t-frame image according to the target position and the size of a t-1 frame image of an image sequence to be detected, inputting the search area image of the t-frame into a first branch convolution network of a trained multilayer feature fusion convolution twin network, and obtaining a search area image feature map of the t-frame image;
3) respectively carrying out cross-correlation operation on the target template characteristic diagram of the t-1 th frame and the corresponding layer of the search area image characteristic diagram of the t-th frame to obtain three score diagrams of a target in the search area image of the t-th frame, and then fusing the score diagrams in a linear weighting mode to obtain a final score diagram of the t-th frame;
4) calculating the target position of the target in the image of the t frame according to the final score map of the t frame;
5) cutting out a target template image of the t frame image according to the target position and size in the t frame image, inputting the target template image of the t frame image into a second branch convolution network of a trained multilayer feature fusion convolution twin network, and marking the obtained feature map of the target template image as MtIf the feature map of the target template image of the t-th frame is
Figure FDA0002455626530000071
η is a smoothing factor;
6) and t +1, repeating the steps 2) -5) until t is equal to N, and ending the target tracking of the image sequence to be detected, wherein N is the total frame number of the image sequence to be detected.
CN201810878152.XA 2018-08-03 2018-08-03 Target tracking method and system of full convolution twin network based on multi-layer feature fusion Expired - Fee Related CN109191491B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810878152.XA CN109191491B (en) 2018-08-03 2018-08-03 Target tracking method and system of full convolution twin network based on multi-layer feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810878152.XA CN109191491B (en) 2018-08-03 2018-08-03 Target tracking method and system of full convolution twin network based on multi-layer feature fusion

Publications (2)

Publication Number Publication Date
CN109191491A CN109191491A (en) 2019-01-11
CN109191491B true CN109191491B (en) 2020-09-08

Family

ID=64920067

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810878152.XA Expired - Fee Related CN109191491B (en) 2018-08-03 2018-08-03 Target tracking method and system of full convolution twin network based on multi-layer feature fusion

Country Status (1)

Country Link
CN (1) CN109191491B (en)

Families Citing this family (62)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110070562A (en) * 2019-04-02 2019-07-30 西北工业大学 A kind of context-sensitive depth targets tracking
CN110021052B (en) * 2019-04-11 2023-05-30 北京百度网讯科技有限公司 Method and apparatus for generating fundus image generation model
CN110246155B (en) * 2019-05-17 2021-05-18 华中科技大学 Anti-occlusion target tracking method and system based on model alternation
CN110210551B (en) * 2019-05-28 2021-07-30 北京工业大学 Visual target tracking method based on adaptive subject sensitivity
CN110222641B (en) * 2019-06-06 2022-04-19 北京百度网讯科技有限公司 Method and apparatus for recognizing image
CN110378938A (en) * 2019-06-24 2019-10-25 杭州电子科技大学 A kind of monotrack method based on residual error Recurrent networks
CN110443827B (en) * 2019-07-22 2022-12-20 浙江大学 Unmanned aerial vehicle video single-target long-term tracking method based on improved twin network
CN110570458B (en) * 2019-08-12 2022-02-01 武汉大学 Target tracking method based on internal cutting and multi-layer characteristic information fusion
CN110473231B (en) * 2019-08-20 2024-02-06 南京航空航天大学 Target tracking method of twin full convolution network with prejudging type learning updating strategy
CN110516745B (en) * 2019-08-28 2022-05-24 北京达佳互联信息技术有限公司 Training method and device of image recognition model and electronic equipment
CN110480128A (en) * 2019-08-28 2019-11-22 华南理工大学 A kind of real-time welding seam tracking method of six degree of freedom welding robot line laser
CN110675423A (en) * 2019-08-29 2020-01-10 电子科技大学 Unmanned aerial vehicle tracking method based on twin neural network and attention model
CN110580713A (en) * 2019-08-30 2019-12-17 武汉大学 Satellite video target tracking method based on full convolution twin network and track prediction
CN112446900B (en) * 2019-09-03 2024-05-17 中国科学院长春光学精密机械与物理研究所 Twin neural network target tracking method and system
CN110807793B (en) * 2019-09-29 2022-04-22 南京大学 Target tracking method based on twin network
CN110728697B (en) * 2019-09-30 2023-06-13 华中光电技术研究所(中国船舶重工集团有限公司第七一七研究所) Infrared dim target detection tracking method based on convolutional neural network
CN110782480B (en) * 2019-10-15 2023-08-04 哈尔滨工程大学 Infrared pedestrian tracking method based on online template prediction
CN110796679B (en) * 2019-10-30 2023-04-07 电子科技大学 Target tracking method for aerial image
CN111105031B (en) * 2019-11-11 2023-10-17 北京地平线机器人技术研发有限公司 Network structure searching method and device, storage medium and electronic equipment
CN111161311A (en) * 2019-12-09 2020-05-15 中车工业研究院有限公司 Visual multi-target tracking method and device based on deep learning
CN111179307A (en) * 2019-12-16 2020-05-19 浙江工业大学 Visual target tracking method for full-volume integral and regression twin network structure
CN112837344B (en) * 2019-12-18 2024-03-29 沈阳理工大学 Target tracking method for generating twin network based on condition countermeasure
CN110992404B (en) * 2019-12-23 2023-09-19 驭势科技(浙江)有限公司 Target tracking method, device and system and storage medium
CN111161317A (en) * 2019-12-30 2020-05-15 北京工业大学 Single-target tracking method based on multiple networks
CN111091582A (en) * 2019-12-31 2020-05-01 北京理工大学重庆创新中心 Single-vision target tracking algorithm and system based on deep neural network
CN111260688A (en) * 2020-01-13 2020-06-09 深圳大学 Twin double-path target tracking method
CN111260682B (en) * 2020-02-10 2023-11-17 深圳市铂岩科技有限公司 Target object tracking method and device, storage medium and electronic equipment
CN111462175B (en) * 2020-03-11 2023-02-10 华南理工大学 Space-time convolution twin matching network target tracking method, device, medium and equipment
CN111340850A (en) * 2020-03-20 2020-06-26 军事科学院***工程研究院***总体研究所 Ground target tracking method of unmanned aerial vehicle based on twin network and central logic loss
CN111415373A (en) * 2020-03-20 2020-07-14 北京以萨技术股份有限公司 Target tracking and segmenting method, system and medium based on twin convolutional network
CN111415318B (en) * 2020-03-20 2023-06-13 山东大学 Unsupervised related filtering target tracking method and system based on jigsaw task
CN111489361B (en) * 2020-03-30 2023-10-27 中南大学 Real-time visual target tracking method based on deep feature aggregation of twin network
CN113538507B (en) * 2020-04-15 2023-11-17 南京大学 Single-target tracking method based on full convolution network online training
CN111583345B (en) * 2020-05-09 2022-09-27 吉林大学 Method, device and equipment for acquiring camera parameters and storage medium
CN111639551B (en) * 2020-05-12 2022-04-01 华中科技大学 Online multi-target tracking method and system based on twin network and long-short term clues
CN111582214B (en) * 2020-05-15 2023-05-12 中国科学院自动化研究所 Method, system and device for analyzing behavior of cage animal based on twin network
CN111753667B (en) * 2020-05-27 2024-05-14 江苏大学 Intelligent automobile single-target tracking method based on twin network
CN113805240B (en) * 2020-05-28 2023-06-27 同方威视技术股份有限公司 Vehicle inspection method and system
CN111797716B (en) * 2020-06-16 2022-05-03 电子科技大学 Single target tracking method based on Siamese network
CN111754546A (en) * 2020-06-18 2020-10-09 重庆邮电大学 Target tracking method, system and storage medium based on multi-feature map fusion
CN111950493B (en) * 2020-08-20 2024-03-08 华北电力大学 Image recognition method, device, terminal equipment and readable storage medium
CN112184752A (en) * 2020-09-08 2021-01-05 北京工业大学 Video target tracking method based on pyramid convolution
CN112734726B (en) * 2020-09-29 2024-02-02 首都医科大学附属北京天坛医院 Angiography typing method, angiography typing device and angiography typing equipment
CN112183675B (en) * 2020-11-10 2023-09-26 武汉工程大学 Tracking method for low-resolution target based on twin network
CN112418203B (en) * 2020-11-11 2022-08-30 南京邮电大学 Robustness RGB-T tracking method based on bilinear convergence four-stream network
CN112330718B (en) * 2020-11-12 2022-08-23 重庆邮电大学 CNN-based three-level information fusion visual target tracking method
CN112381788B (en) * 2020-11-13 2022-11-22 北京工商大学 Part surface defect increment detection method based on double-branch matching network
CN112330719B (en) * 2020-12-02 2024-02-27 东北大学 Deep learning target tracking method based on feature map segmentation and self-adaptive fusion
CN112836606A (en) * 2021-01-25 2021-05-25 合肥工业大学 Aerial photography target tracking method fusing target significance and online learning interference factor
CN112785626A (en) * 2021-01-27 2021-05-11 安徽大学 Twin network small target tracking method based on multi-scale feature fusion
CN112884037B (en) * 2021-02-09 2022-10-21 中国科学院光电技术研究所 Target tracking method based on template updating and anchor-frame-free mode
CN113192124B (en) * 2021-03-15 2024-07-02 大连海事大学 Image target positioning method based on twin network
CN113052874B (en) * 2021-03-18 2022-01-25 上海商汤智能科技有限公司 Target tracking method and device, electronic equipment and storage medium
CN113379788B (en) * 2021-06-29 2024-03-29 西安理工大学 Target tracking stability method based on triplet network
CN113808166B (en) * 2021-09-15 2023-04-18 西安电子科技大学 Single-target tracking method based on clustering difference and depth twin convolutional neural network
CN113870312B (en) * 2021-09-30 2023-09-22 四川大学 Single target tracking method based on twin network
CN114155274B (en) * 2021-11-09 2024-05-24 中国海洋大学 Target tracking method and device based on global scalable twin network
WO2023112581A1 (en) * 2021-12-14 2023-06-22 富士フイルム株式会社 Inference device
CN114429491B (en) * 2022-04-07 2022-07-08 之江实验室 Pulse neural network target tracking method and system based on event camera
CN114820709B (en) * 2022-05-05 2024-03-08 郑州大学 Single-target tracking method, device, equipment and medium based on improved UNet network
CN115393406B (en) * 2022-08-17 2024-05-10 中船智控科技(武汉)有限公司 Image registration method based on twin convolution network
CN115330876B (en) * 2022-09-15 2023-04-07 中国人民解放军国防科技大学 Target template graph matching and positioning method based on twin network and central position estimation

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107818575A (en) * 2017-10-27 2018-03-20 深圳市唯特视科技有限公司 A kind of visual object tracking based on layering convolution
CN107992826A (en) * 2017-12-01 2018-05-04 广州优亿信息科技有限公司 A kind of people stream detecting method based on the twin network of depth
CN108257158A (en) * 2018-03-27 2018-07-06 福州大学 A kind of target prediction and tracking based on Recognition with Recurrent Neural Network
CN108320297A (en) * 2018-03-09 2018-07-24 湖北工业大学 A kind of video object method for real time tracking and system
US10204299B2 (en) * 2015-11-04 2019-02-12 Nec Corporation Unsupervised matching in fine-grained datasets for single-view object reconstruction

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10204299B2 (en) * 2015-11-04 2019-02-12 Nec Corporation Unsupervised matching in fine-grained datasets for single-view object reconstruction
CN107818575A (en) * 2017-10-27 2018-03-20 深圳市唯特视科技有限公司 A kind of visual object tracking based on layering convolution
CN107992826A (en) * 2017-12-01 2018-05-04 广州优亿信息科技有限公司 A kind of people stream detecting method based on the twin network of depth
CN108320297A (en) * 2018-03-09 2018-07-24 湖北工业大学 A kind of video object method for real time tracking and system
CN108257158A (en) * 2018-03-27 2018-07-06 福州大学 A kind of target prediction and tracking based on Recognition with Recurrent Neural Network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Hierarchical Convolutional Features for Visual Tracking;Chao Ma等;《2015 IEEE International Conference on Computer Vision》;20151231;3074-3082 *

Also Published As

Publication number Publication date
CN109191491A (en) 2019-01-11

Similar Documents

Publication Publication Date Title
CN109191491B (en) Target tracking method and system of full convolution twin network based on multi-layer feature fusion
CN111354017B (en) Target tracking method based on twin neural network and parallel attention module
CN107274433B (en) Target tracking method and device based on deep learning and storage medium
CN106127684B (en) Image super-resolution Enhancement Method based on forward-backward recutrnce convolutional neural networks
WO2023273136A1 (en) Target object representation point estimation-based visual tracking method
CN111311666B (en) Monocular vision odometer method integrating edge features and deep learning
CN108038420B (en) Human behavior recognition method based on depth video
WO2020108362A1 (en) Body posture detection method, apparatus and device, and storage medium
CN111161317A (en) Single-target tracking method based on multiple networks
CN112184752A (en) Video target tracking method based on pyramid convolution
CN109360156A (en) Single image rain removing method based on the image block for generating confrontation network
CN107730536B (en) High-speed correlation filtering object tracking method based on depth features
CN112288627B (en) Recognition-oriented low-resolution face image super-resolution method
CN112232134B (en) Human body posture estimation method based on hourglass network and attention mechanism
CN113706581B (en) Target tracking method based on residual channel attention and multi-level classification regression
CN110705344A (en) Crowd counting model based on deep learning and implementation method thereof
CN110942476A (en) Improved three-dimensional point cloud registration method and system based on two-dimensional image guidance and readable storage medium
CN111476133B (en) Unmanned driving-oriented foreground and background codec network target extraction method
CN111815665A (en) Single image crowd counting method based on depth information and scale perception information
CN115147456B (en) Target tracking method based on time sequence self-adaptive convolution and attention mechanism
CN111882581A (en) Multi-target tracking method for depth feature association
CN108629301A (en) A kind of human motion recognition method based on moving boundaries dense sampling and movement gradient histogram
CN113592900A (en) Target tracking method and system based on attention mechanism and global reasoning
CN116128763A (en) Aircraft skin damage image enhancement method based on deep neural network fusion
CN109492524B (en) Intra-structure relevance network for visual tracking

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200908

Termination date: 20210803

CF01 Termination of patent right due to non-payment of annual fee