WO2021142571A1 - 一种孪生双路目标跟踪方法 - Google Patents

一种孪生双路目标跟踪方法 Download PDF

Info

Publication number
WO2021142571A1
WO2021142571A1 PCT/CN2020/071743 CN2020071743W WO2021142571A1 WO 2021142571 A1 WO2021142571 A1 WO 2021142571A1 CN 2020071743 W CN2020071743 W CN 2020071743W WO 2021142571 A1 WO2021142571 A1 WO 2021142571A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
convolutional layer
image
template image
target tracking
Prior art date
Application number
PCT/CN2020/071743
Other languages
English (en)
French (fr)
Inventor
曹文明
Original Assignee
深圳大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳大学 filed Critical 深圳大学
Priority to PCT/CN2020/071743 priority Critical patent/WO2021142571A1/zh
Publication of WO2021142571A1 publication Critical patent/WO2021142571A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Definitions

  • the invention relates to the technical field of video tracking, in particular to a twin dual-path target tracking method.
  • Target tracking is an important research direction in computer vision, and it has a wide range of applications, such as video surveillance, human-computer interaction, and unmanned driving. Because there are interference factors such as dramatic changes in the appearance of the target, target occlusion, and illumination changes, and real-time factors must be considered, although the research on target tracking algorithms has achieved significant improvements in recent years. For example, an anti-robust tracking algorithm based on optical flow method and feature point matching has achieved good results in traditional tracking point matching algorithms, but still cannot meet the requirements of industrial and commercial applications.
  • Target tracking is a very challenging task, especially for moving targets, whose moving scenes are very complex and often change, or the target itself will constantly change. How to identify and track changing targets in complex scenes has become a challenging task. In the context of increasing computing power and increasing data volume, it is an urgent need to enhance the real-time and robustness of target tracking.
  • the purpose of the present invention is to overcome the above-mentioned shortcomings of the prior art and provide a twin dual-channel target tracking method, which introduces a deep network algorithm based on the twin dual-channel input framework, and further adds a geometric feature network to increase the robustness of target tracking sex.
  • the present invention provides a twin dual-path target tracking method.
  • the method includes the following steps:
  • the target position in the frame to be tracked is determined based on the score map.
  • the twin two-way neural network includes a first convolutional layer, a first pooling layer, a second convolutional layer, a second pooling layer, a third convolutional layer, and a fourth convolutional layer, which are sequentially connected.
  • Convolutional layer and fifth convolutional layer, and the activation function ReLU is connected to the first convolutional layer, second convolutional layer, third convolutional layer and fourth convolutional layer.
  • the center of the search image in the next frame is positioned at the center of the target tracking frame in the previous frame.
  • a cross-correlation method is used to compare the similarity between the search area and the template image, and then the score map is obtained.
  • the template image is fixed to the standard tracking frame area of the initial frame of the video frame, and is not updated during the target tracking process.
  • the training image pair is composed of pictures of different video frames in the same video in the training set.
  • the target center is fixed at the center of the training picture pair and Normalize the size of the object.
  • the size area A of the template image is selected by the following formula:
  • w and h are the width and height of the standard tracking frame
  • p is the length of the filled area
  • s is the size normalization parameter
  • the present invention has the advantages of deforming the result of the geometric characteristics of adjacent frames to the current detection frame, increasing the detection stability of the current frame; by establishing a geometric template, a detection frame model and a time sequence scoring model, it is very effective.
  • Solve the problem of fast target movement and target blur obtain the position of the object in the template frame according to the direction of the geometric target movement in the adjacent frames, and then form the geometric target attention model, and propose a drip and return suitable for the twin tracking frame
  • the unified attention model improves the success rate of target tracking.
  • Fig. 1 is a flowchart of a twin dual-path target tracking method according to an embodiment of the present invention
  • Figure 2 is a schematic diagram of a model structure based on twin dual-path target tracking according to an embodiment of the present invention
  • Fig. 3 is an effect diagram of a twin dual-path target tracking method according to an embodiment of the present invention.
  • the twin two-way target tracking method includes: step S110: input the template image and the candidate frame search area in the frame to be tracked into the trained twin two-way neural network for feature extraction, Obtain a first feature map corresponding to the template image and a second feature map corresponding to the candidate frame search area in the frame to be tracked; step S120, using the first feature map as a convolution kernel, and using the second feature map Perform a convolution operation for the image to be convolved to obtain a score map indicating the degree of similarity between each position in the search area and the template image; step S130, determine the target position in the frame to be tracked based on the score map.
  • Fig. 2 is a network structure diagram of a twin two-way target tracking method according to an embodiment of the present invention, where z represents a template image, x represents a search area, that is, the search area of a candidate frame in the current video frame, and x and z are respectively input to Two feature extraction networks (such as CNN), the two feature extraction networks respectively map the input to a new space to form a representation of the input in the new space.
  • Two feature extraction networks such as CNN
  • the target tracking problem can be understood as a similarity learning problem for objects in the initial frame.
  • the target tracking method learns the matching equation f(z,x); compares the initial frame template image z with the current frame candidate image x
  • the similarity that is, two inputs go through a unique conversion network at the same time
  • use other methods g to combine the two outputs to get the matching equation
  • the twin adopts the AlexNet network, removes the fully connected layer, and retains the convolutional layer and the pooling layer. For example, after the template image z and the search image x pass through the feature network, feature maps with sizes of 6 ⁇ 6 ⁇ 128 and 22 ⁇ 22 ⁇ 128 are obtained, respectively.
  • b represents the bias variable of each position.
  • Feature network The output result is a feature map instead of a one-dimensional vector.
  • the center of the search image of the current frame is positioned at the center of the target tracking frame of the previous frame.
  • the convolutional network used for feature extraction is fully convolutional with respect to the search image x, so that it can cope with changes in the target scale.
  • three or five scale volumes are used for both the search image and the template image.
  • Product operation take the scale map with the highest response as the current target's position response map.
  • a cross-correlation method can be used to compare the similarity between the search area and the target template to obtain a score map; then, bicubic interpolation is used for up-sampling to obtain a more accurate target position. From the mathematical principle, this method is very similar to the tracking method of the correlation filter type, the difference is that the correlation filter uses a more convenient spectrum interpolation to obtain a more accurate target frame.
  • the feature extraction network is trained by training positive and negative sample pairs, and the logistic loss function is used:
  • v represents the true value of each point in the candidate response graph
  • y ⁇ +1,-1 ⁇ represents the label of the standard tracking box.
  • the total loss function is composed of the logical loss of each point of the score graph, expressed as:
  • the true label of each location u ⁇ D needs to be obtained: y[u] ⁇ +1,-1 ⁇ .
  • the algorithm uses the template image and the image pair composed of the search image, and then convolves with each other to obtain the mapping score v:D ⁇ R.
  • the training image pair is composed of pictures of different video frames in the same video in the training set, and the number of frames difference between different video frames is T. By cropping and zooming the picture, the target center is fixed at the center of the training picture pair and the size of the object is normalized.
  • the positive sample of the training label is obtained in the following way:
  • u represents the value of each point in the score map
  • c represents the center in the training image
  • k represents the step size of the feature network.
  • the size of the template image is 127 ⁇ 127, and the size of the search image is 255 ⁇ 255.
  • the image needs to be preprocessed. For example, the image is not simply cropped and zoomed, but based on the tracking frame.
  • the size and position are filled. More specifically, suppose that the width and height of the standard tracking frame are w and h, the length of the filled area is p, and the size normalization parameter is s. Select the size area in the following ways:
  • the network structure adopted by the feature extraction network is the network structure based on AlexNet, and the specific parameters of the network are shown in Table 1.
  • the first two convolutional layers are followed by the maximum pooling operation, and except for the fifth convolution, the other layers are connected with the activation function ReLU after the convolutional layer, and batch operations are performed on each layer of the network. It is worth noting that in the process of convolution, no pixel completion operation is performed on the convolved image.
  • the twin two-way target tracking method of the present invention has a simple structure and high real-time performance. It only uses the features of the CNN network, and does not use features such as colors, gradient histograms, and the like like other algorithms.
  • the template image is always the target image of the initial frame, and there is no process of updating the template image. Nevertheless, the method of the present invention has achieved good results by learning the similarity of the feature network under different targets off-line, and matching the template image and the search image under different sizes at the same time. For example, the results of the OTB tracking data set are shown in Figure 3, where the ordinate is the success rate.
  • the success rate of the present invention is 0.612 (SiamFC), which is similar to other algorithms in the figure. Higher than the success rate.
  • the network structure of the present invention is simple and the portability is strong, so it can be improved better.
  • the present invention applies the optical flow network model to increase the stability of the current frame detection by deforming the results of the optical flow of adjacent frames to the current detection frame; by establishing a template, a detection frame model, and a timing scoring model, it can It solves the problem of fast target movement and target blur; according to the optical flow motion direction of adjacent frames, the position of the object in the template frame is obtained, and then the optical flow twin model is formed.
  • the target tracking method of the present invention can be applied to recognition technology, cloud data analysis and the like.
  • the present invention may be a system, a method and/or a computer program product.
  • the computer program product may include a computer-readable storage medium loaded with computer-readable program instructions for enabling a processor to implement various aspects of the present invention.
  • the computer-readable storage medium may be a tangible device that holds and stores instructions used by the instruction execution device.
  • the computer-readable storage medium may include, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing, for example.
  • Non-exhaustive list of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM) Or flash memory), static random access memory (SRAM), portable compact disk read-only memory (CD-ROM), digital versatile disk (DVD), memory stick, floppy disk, mechanical encoding device, such as a printer with instructions stored thereon
  • RAM random access memory
  • ROM read-only memory
  • EPROM erasable programmable read-only memory
  • flash memory flash memory
  • SRAM static random access memory
  • CD-ROM compact disk read-only memory
  • DVD digital versatile disk
  • memory stick floppy disk
  • mechanical encoding device such as a printer with instructions stored thereon

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

本发明提供一种孪生双路目标跟踪方法。该方法包括:将模板图像和待跟踪帧中的候选框搜索区域输入到经训练的孪生双路神经网络进行特征提取,获得与模板图像对应的第一特征图以及与待跟踪帧中的候选框搜索区域对应的第二特征图;以所述第一特征图为卷积核,以所述第二特征图为待卷积图像进行卷积操作,获得表示搜索区域中每个位置与模板图像的相似程度的得分图;基于所述得分图确定待跟踪帧中的目标位置。本发明的方法能够提高目标跟踪的实时性和鲁棒性。

Description

一种孪生双路目标跟踪方法 技术领域
本发明涉及视频跟踪技术领域,尤其涉及一种孪生双路目标跟踪方法。
背景技术
目标跟踪是计算机视觉中的一个重要研究方向,有着广泛的应用,如视频监控、人机交互、无人驾驶等。由于存在目标发生剧烈的外观变化、目标遮挡、光照变换等干扰因素,并且还要考虑实时性的因素,所以尽管近年来目标跟踪算法研究取得了显著性的提升。例如,基于光流法和特征点匹配的抗鲁棒跟踪算法,其在传统的跟踪点匹配算法中取得了不错的效果,但是仍无法达到工业及商业应用的需求。
目标跟踪是一个极具挑战性的任务,特别是对于运动目标而言,其运动的场景非常复杂并且经常发生变化,或是目标本身也会不断变化。如何在复杂场景中识别并跟踪不断变化的目标就成为一个挑战性的任务。在计算能力提高和数据量增加的背景下,增强目标跟踪的实时性和鲁棒性是一项迫切的需求。
发明内容
本发明的目的在于克服上述现有技术的缺陷,提供一种孪生双路目标跟踪方法,引入了基于孪生双路输入框架的深度网络算法,并进一步加入几何特征网络,以增加目标跟踪的鲁棒性。
本发明的提供一种孪生双路目标跟踪方法。该方法包括以下步骤:
将模板图像和待跟踪帧中的候选框搜索区域输入到经训练的孪生双路神经网络进行特征提取,获得与模板图像对应的第一特征图以及与待跟踪帧中的候选框搜索区域对应的第二特征图;
以所述第一特征图为卷积核,以所述第二特征图为待卷积图像进行卷积操作,获得表示搜索区域中每个位置与模板图像的相似程度的得分图;
基于所述得分图确定待跟踪帧中的目标位置。
在一个实施例中,所述孪生双路神经网络包括依次连接的第一卷积层、第一池化层、第二卷积层,第二池化层、第三卷积层、第四卷积层和第五卷积层,并且在第一卷积层、第二卷积层、第三卷积层和第四卷积层接上激活函数ReLU。
在一个实施例中,在目标跟踪过程中,将后一帧搜索图的中心定位在前一帧目标跟踪框的中心位置。
在一个实施例中,在目标跟踪过程中,采用交叉相关方法来比较搜索区域与模板图像的相似度,进而得到所述得分图。
在一个实施例中,将所述模板图像固定为视频帧初始帧的标准跟踪框区域,并且在目标跟踪过程中不进行更新。
在一个实施例中,训练所述孪生双路神经网络时,训练图像对由训练集中同一视频中不同视频帧的图片组成,通过对图片进行裁剪和缩放,将目标中心固定在训练图片对中心并且对物体的尺寸进行归一化处理。
在一个实施例中,通过以下公式选择模板图像的尺寸区域A:
s(w+2p)×s(h+2p)=A
其中,w和h分别是标准跟踪框的宽和高,p为填充的区域长,s是尺寸归一化参数。
与现有技术相比,本发明的优点在于:将相邻帧几何特征的结果变形到当前检测帧上,增加当前帧检测稳定性;通过建立几何模板,检测帧模型以及时序打分模型,能很好地解决目标快速运动、目标模糊的问题;根据相邻帧的几何目标运动方向得出模板帧中物体的位置,然后形成几何目标注意力模型,并提出了适用于孪生跟踪框架的滴漏与归一化注意力模型,提高了目标跟踪的成功率。
附图说明
以下附图仅对本发明作示意性的说明和解释,并不用于限定本发明的范围,其中:
图1是根据本发明一个实施例的基于孪生双路目标跟踪方法的流程图;
图2是根据本发明一个实施例的基于孪生双路目标跟踪的模型结构示意图;
图3是根据本发明一个实施例的基于孪生双路目标跟踪方法的效果图。
具体实施方式
为了使本发明的目的、技术方案、设计方法及优点更加清楚明了,以下结合附图通过具体实施例对本发明进一步详细说明。应当理解,此处所描述的具体实施例仅用于解释本发明,并不用于限定本发明。
在本文示出和讨论的所有例子中,任何具体值应被解释为仅仅是示例性的,而不是作为限制。因此,示例性实施例的其它例子可以具有不同的值。
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。参见图1所示,本发明实施例提供的孪生双路目标跟踪方法包括:步骤S110,将模板图像和待跟踪帧中的候选框搜索区域输入到经训练的孪生双路神经网络进行特征提取,获得与模板图像对应的第一特征图以及与待跟踪帧中的候选框搜索区域对应的第二特征图;步骤S120,以所述第一特征图为卷积核,以所述第二特征图为待卷积图像进行卷积操作,获得表示搜索区域中每个位置与模板图像的相似程度的得分图;步骤S130,基于所述得分图确定待跟踪帧中的目标位置。
图2是根据本发明一个实施例的孪生双路目标跟踪方法的网络结构图,其中z表示模板图像,x表示搜索区域,即当前视频帧中的候选框搜索区域,将x和z分别输入到两个特征提取网络(如CNN),该两个特征提取网络分别将输入映射到新的空间,形成输入在新的空间中的表示。通过计算损失函数,评价模板图像和搜索区域的相似度。
具体地,基于图2的架构,目标跟踪问题可以理解为对初始帧物体的相似性学习问题,目标跟踪方法学习匹配方程f(z,x);比较初始帧模板图像z和当前帧候选图像x的相似性,即同时有两个输入经过特有的转化网络
Figure PCTCN2020071743-appb-000001
紧接着用其他方法g结合两个的输出,得到匹配方程
Figure PCTCN2020071743-appb-000002
在一个实施例中,z表示的模板图像固定为视频帧初始帧的标准跟踪 框区域。这是因为初始帧受到的污染最小,无论物体被遮挡还是消失,也能再次检测跟踪到目标物体。x表示的搜索区域,由当前检测帧裁剪以及缩放得到,例如被设置为固定的大小255×255。
Figure PCTCN2020071743-appb-000003
表示特征映射的操作,使原图经过CNN网络得到特征图,同时特征提取网络是一个全卷积的网络。
为了提高目标跟踪的实时性,在本发明实施例中,孪生采用AlexNet网络且去掉全连接层,保留了卷积层和池化层。例如,模板图像z和搜索图像x经过特征网络后分别得到大小为6×6×128和22×22×128的特征图。在图1中,*表示相互卷积操作,模板图像的特征图为卷积核,搜索图像的特征图为待卷积图像,6*6*128代表z经过
Figure PCTCN2020071743-appb-000004
(特征映射)后得到的特征,22*22*128是x经过
Figure PCTCN2020071743-appb-000005
后的特征;22*22*128的特征被6*6*128的卷积核卷积,相互卷积后可以得到大小为17×17×1的得分图,表示搜索区域中每个位置与模板的相似程度。最后得分图中得分最高的位置即为当前帧目标的位置。上述过程用匹配方程表达:
Figure PCTCN2020071743-appb-000006
其中,b表示每一个位置的偏置变量。而特征网络
Figure PCTCN2020071743-appb-000007
输出的结果是一个特征图而不是一维的向量。在目标跟踪的过程中,将当前帧搜索图的中心定位在前一帧目标跟踪框的中心位置。
在一个实施例中,用于特征提取的卷积网络关于搜索图像x是全卷积的,因而能够应对目标尺度的变化,例如,对搜索图和模板图同时采用三个或五个尺度的卷积操作,取其中响应最高的尺度图作为当前目标的位置响应图。具体地,可以采用交叉相关的方法来比较搜索区域与目标模板的相似度,进而得到得分图;然后,采用双三次插值进行上采样,获得更加精确的目标位置。从数学原理上来说,这种方法与相关滤波类的跟踪方法十分相似,差别在于相关滤波采用了更加方便的频谱插值得到更为精确的目标框。
在一个实施例中,通过训练正负样本对来训练特征提取网络,并采用逻辑损失函数:
Figure PCTCN2020071743-appb-000008
其中,v表示候选响应图中每个点的真实值,y∈{+1,-1}表示标准跟踪框的标签。总的损失函数由得分图每个点的逻辑损失组成,表示为:
Figure PCTCN2020071743-appb-000009
其中需要得出每个位置u∈D的真实标签:y[u]∈{+1,-1}。在训练过程中,算法采用了模板图像以及搜索图像组成的图像对,然后相互卷积,得出映射得分v:D→R。训练图像对由训练集中同一视频中不同视频帧的图片组成,且不同视频帧之间相差帧数为T。通过对图片进行裁剪和缩放,将目标中心固定在训练图片对的中心并且对物体的尺寸进行归一化处理。
在一个实施例中,训练标签的正样本由以下方式得到:
Figure PCTCN2020071743-appb-000010
其中,u表示得分图中每个点的值,c表示训练图中的中心,k表示特征网络的步长。当得分图中某个点与中心的距离小于R时,这个点被标记为正样本。除此之外,由于得分图中的正样本比负样本要少得多,因此还会对正负样本进行权重相乘,以平衡正负样本的数量。
在训练过程中,采用模板图像的尺寸为127×127,而搜索图像的尺寸为255×255,需要对图像进行预处理,例如,不是简单地对图像进行裁剪和缩放,而是根据跟踪框的大小和位置进行填充。更具体来说,假设标准跟踪框的宽和高为w和h,填充的区域长为p,尺寸归一化参数为s。通过以下方式选择尺寸区域:
s(w+2p)×s(h+2p)=A       (5)
对于模板图像来说A=127 2,同时p=(w+h)/4。对于超过图片边界没有像素可以截取的区域,用此图像所有像素的均值来填充。
在一个实施例中,特征提取网络采用的网络结构是以AlexNet为基础的网络结构,网络具体的参数如表1。第一二层卷积层后面接最大值池化操作,并且除了第五层卷积外,其它层都在卷积层后接上激活函数ReLU,并对网络的每一层进行批处理操作。值得注意的是,卷积的过程中没有对被卷积的图像进行像素补全操作。
表1 孪生网络各层参数
Figure PCTCN2020071743-appb-000011
Figure PCTCN2020071743-appb-000012
本发明的孪生双路目标跟踪方法,结构简单,实时性高,其只用了CNN网络的特征,并没有像其他算法一样使用颜色、梯度直方图等特征。另外模板图像一直都是初始帧的目标图像,并没有更新模板图像的过程。尽管如此,通过离线学习特征网络在不同目标下的相似性,同时对不同尺寸下的模板图像和搜索图像作匹配操作,本发明的方法取得了不错的成绩。例如,在OTB跟踪数据集的结果如图3所示,其中纵坐标是成功率,可以看出,在OTB-cvpr13数据集中,本发明的成功率为0.612(SiamFC),与图中其他算法相比成功率更高。并且本发明网络结构简单,可移植性强,因此可以更好地对其进行改进。
综上所述,本发明应用了光流网络模型,通过将相邻帧光流的结果变形到当前检测帧上,增加当前帧检测稳定性;通过建立模板,检测帧模型以及时序打分模型,能够很好地解决目标快速运动、目标模糊的问题;根据相邻帧的光流运动方向得出模板帧中物体的位置,然后形成光流孪生模型。本发明目标跟踪方法能够应用于识别技术,云数据分析等。
需要说明的是,虽然上文按照特定顺序描述了各个步骤,但是并不意味着必须按照上述特定顺序来执行各个步骤,实际上,这些步骤中的一些可以并发执行,甚至改变顺序,只要能够实现所需要的功能即可。
本发明可以是***、方法和/或计算机程序产品。计算机程序产品可以包括计算机可读存储介质,其上载有用于使处理器实现本发明的各个方面的计算机可读程序指令。
计算机可读存储介质可以是保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以包括但不限于电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、静态随机存取存储器 (SRAM)、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。
以上已经描述了本发明的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。

Claims (9)

  1. 一种孪生双路目标跟踪方法,包括以下步骤:
    将模板图像和待跟踪帧中的候选框搜索区域输入到经训练的孪生双路神经网络进行特征提取,获得与模板图像对应的第一特征图以及与待跟踪帧中的候选框搜索区域对应的第二特征图;
    以所述第一特征图为卷积核,以所述第二特征图为待卷积图像进行卷积操作,获得表示搜索区域中每个位置与模板图像的相似程度的得分图;
    基于所述得分图确定待跟踪帧中的目标位置。
  2. 根据权利要求1所述的方法,其特征在于,所述孪生双路神经网络包括依次连接的第一卷积层、第一池化层、第二卷积层,第二池化层、第三卷积层、第四卷积层和第五卷积层,并且在第一卷积层、第二卷积层、第三卷积层和第四卷积层接上激活函数ReLU。
  3. 根据权利要求1所述的方法,其特征在于,在目标跟踪过程中,将后一帧搜索图的中心定位在前一帧目标跟踪框的中心位置。
  4. 根据权利要求1所述的方法,其特征在于,在目标跟踪过程中,采用交叉相关方法来比较搜索区域与模板图像的相似度,进而得到所述得分图。
  5. 根据权利要求1所述的方法,其特征在于,将所述模板图像固定为视频帧初始帧的标准跟踪框区域,并且在目标跟踪过程中不进行更新。
  6. 根据权利要求1所述的方法,其特征在于,训练所述孪生双路神经网络时,训练图像对由训练集中同一视频中不同视频帧的图片组成,通过对图片进行裁剪和缩放,将目标中心固定在训练图片对中心并且对物体的尺寸进行归一化处理。
  7. 根据权利要求6所述的方法,其特征在于,通过以下公式选择模板图像的尺寸区域A:
    s(w+2p)×s(h+2p)=A
    其中,w和h分别是标准跟踪框的宽和高,p为填充的区域长,s是尺寸归一化参数。
  8. 一种计算机可读存储介质,其上存储有计算机程序,其中,该程序被处理器执行时实现根据权利要求1至7中任一项所述方法的步骤。
  9. 一种计算机设备,包括存储器和处理器,在所述存储器上存储有能够在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现权利要求1至7中任一项所述的方法的步骤。
PCT/CN2020/071743 2020-01-13 2020-01-13 一种孪生双路目标跟踪方法 WO2021142571A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/071743 WO2021142571A1 (zh) 2020-01-13 2020-01-13 一种孪生双路目标跟踪方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/071743 WO2021142571A1 (zh) 2020-01-13 2020-01-13 一种孪生双路目标跟踪方法

Publications (1)

Publication Number Publication Date
WO2021142571A1 true WO2021142571A1 (zh) 2021-07-22

Family

ID=76863386

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/071743 WO2021142571A1 (zh) 2020-01-13 2020-01-13 一种孪生双路目标跟踪方法

Country Status (1)

Country Link
WO (1) WO2021142571A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109214238A (zh) * 2017-06-30 2019-01-15 百度在线网络技术(北京)有限公司 多目标跟踪方法、装置、设备及存储介质
CN109903310A (zh) * 2019-01-23 2019-06-18 平安科技(深圳)有限公司 目标跟踪方法、装置、计算机装置及计算机存储介质
KR101959436B1 (ko) * 2018-08-06 2019-07-02 전북대학교 산학협력단 배경인식을 이용한 물체 추적시스템
US20190287264A1 (en) * 2018-03-14 2019-09-19 Tata Consultancy Services Limited Context based position estimation of target of interest in videos
CN110428447A (zh) * 2019-07-15 2019-11-08 杭州电子科技大学 一种基于策略梯度的目标跟踪方法与***
CN110580713A (zh) * 2019-08-30 2019-12-17 武汉大学 基于全卷积孪生网络和轨迹预测的卫星视频目标跟踪方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109214238A (zh) * 2017-06-30 2019-01-15 百度在线网络技术(北京)有限公司 多目标跟踪方法、装置、设备及存储介质
US20190287264A1 (en) * 2018-03-14 2019-09-19 Tata Consultancy Services Limited Context based position estimation of target of interest in videos
KR101959436B1 (ko) * 2018-08-06 2019-07-02 전북대학교 산학협력단 배경인식을 이용한 물체 추적시스템
CN109903310A (zh) * 2019-01-23 2019-06-18 平安科技(深圳)有限公司 目标跟踪方法、装置、计算机装置及计算机存储介质
CN110428447A (zh) * 2019-07-15 2019-11-08 杭州电子科技大学 一种基于策略梯度的目标跟踪方法与***
CN110580713A (zh) * 2019-08-30 2019-12-17 武汉大学 基于全卷积孪生网络和轨迹预测的卫星视频目标跟踪方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MARTIN ATZMUELLER, ALVIN CHIN, FREDERIK JANSSEN, IMMANUEL SCHWEIZER, CHRISTOPH TRATTNER: "ICIAP: International Conference on Image Analysis and Processing, 17th International Conference, Naples, Italy, September 9-13, 2013. Proceedings", vol. 9914 Chap.56, 3 November 2016, SPRINGER, Berlin, Heidelberg, ISBN: 978-3-642-17318-9, article BERTINETTO LUCA; VALMADRE JACK; HENRIQUES JOÃO F.; VEDALDI ANDREA; TORR PHILIP H.: "Fully-Convolutional Siamese Networks for Object Tracking", pages: 850 - 865, XP047361104, 032548, DOI: 10.1007/978-3-319-48881-3_56 *
WANG YONG: "Research on Object Tracking Algorithm Based on Siamese Networks and Its Application in Ship Scene", SCIENCE & ENGINEERING (B), CHINA MASTER’S THESES FULL-TEXT DATABASE, 15 July 2019 (2019-07-15), XP055829708 *

Similar Documents

Publication Publication Date Title
CN111260688A (zh) 一种孪生双路目标跟踪方法
CN108647694B (zh) 基于上下文感知和自适应响应的相关滤波目标跟踪方法
CN108776975B (zh) 一种基于半监督特征和滤波器联合学习的视觉跟踪方法
CN107871106B (zh) 人脸检测方法和装置
CN108062525B (zh) 一种基于手部区域预测的深度学习手部检测方法
CN107481264A (zh) 一种自适应尺度的视频目标跟踪方法
CN109993091B (zh) 一种基于背景消除的监控视频目标检测方法
CN111161317A (zh) 一种基于多重网络的单目标跟踪方法
Shi et al. An image mosaic method based on convolutional neural network semantic features extraction
Liu et al. Crowd counting method based on the self-attention residual network
Kim et al. Fast pedestrian detection in surveillance video based on soft target training of shallow random forest
Lu et al. Learning transform-aware attentive network for object tracking
CN112541491B (zh) 基于图像字符区域感知的端到端文本检测及识别方法
CN112364865B (zh) 一种复杂场景中运动小目标的检测方法
Zhang et al. A background-aware correlation filter with adaptive saliency-aware regularization for visual tracking
CN111914756A (zh) 一种视频数据处理方法和装置
Zhang et al. Part-based visual tracking with spatially regularized correlation filters
CN111027586A (zh) 一种基于新型响应图融合的目标跟踪方法
Asadi-Aghbolaghi et al. Supervised spatio-temporal kernel descriptor for human action recognition from RGB-depth videos
Xie et al. RGB-D object tracking with occlusion detection
Ke et al. Lightweight convolutional neural network-based pedestrian detection and re-identification in multiple scenarios
Xiao et al. Single-scale siamese network based RGB-D object tracking with adaptive bounding boxes
CN110827327B (zh) 一种基于融合的长期目标跟踪方法
Yang et al. Accurate visual tracking via reliable patch
Hui et al. DSAA-YOLO: UAV remote sensing small target recognition algorithm for YOLOV7 based on dense residual super-resolution and anchor frame adaptive regression strategy

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20913330

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 11.11.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20913330

Country of ref document: EP

Kind code of ref document: A1