CN109064493B - Target tracking method and device based on meta-learning - Google Patents

Target tracking method and device based on meta-learning Download PDF

Info

Publication number
CN109064493B
CN109064493B CN201810862625.7A CN201810862625A CN109064493B CN 109064493 B CN109064493 B CN 109064493B CN 201810862625 A CN201810862625 A CN 201810862625A CN 109064493 B CN109064493 B CN 109064493B
Authority
CN
China
Prior art keywords
frame
target
tracking model
training
improved tracking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810862625.7A
Other languages
Chinese (zh)
Other versions
CN109064493A (en
Inventor
何智群
董远
白洪亮
熊风烨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SUZHOU FEISOU TECHNOLOGY Co.,Ltd.
Original Assignee
Suzhou Feisou Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Feisou Technology Co ltd filed Critical Suzhou Feisou Technology Co ltd
Priority to CN201810862625.7A priority Critical patent/CN109064493B/en
Publication of CN109064493A publication Critical patent/CN109064493A/en
Application granted granted Critical
Publication of CN109064493B publication Critical patent/CN109064493B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a target tracking method and a target tracking device based on meta-learning, wherein the method comprises the following steps: performing online training on the improved tracking model by using meta-learning according to a first frame of a video to be tracked; wherein the improved tracking model is obtained by solving filter parameters in a correlation filtering algorithm using a convolutional neural network; respectively extracting features of a search sample of a current frame and a search sample of the first frame in the video to be tracked based on the trained improved tracking model, and synthesizing the features corresponding to the current frame and the features corresponding to the first frame to obtain final features; and performing regression processing on the final features based on a regression layer in the improved tracking model to obtain a response graph, and obtaining a target frame in the current frame according to the response graph. The method and the device quickly adjust the parameters of the convolutional neural network through meta-learning, and improve the speed of updating the parameters and the accuracy of target tracking in the target tracking process.

Description

Target tracking method and device based on meta-learning
Technical Field
The invention belongs to the technical field of target tracking, and particularly relates to a target tracking method and device based on meta-learning.
Background
Target tracking is an important research direction in computer vision, and has wide applications, such as human body tracking, vehicle tracking in traffic monitoring systems, human face tracking, gesture tracking in intelligent interactive systems, and the like. In brief, target tracking is to establish a position relationship of an object to be tracked in a continuous video sequence to obtain a complete motion track of the object.
The single target tracking algorithm of the main stream of targets is mainly divided into three categories, namely a template matching method based on correlation filtering, a feature matching method based on a convolutional neural network and a method combining correlation filtering and depth features. The template matching method based on the relevant filtering has high target tracking speed, but due to the limitation of characteristics, the good tracking performance is difficult to achieve; the feature matching method based on the convolutional neural network generally only needs offline training, and does not need to adjust the feature extraction network online, but needs a large number of training samples in the offline training process, and needs to repeatedly adjust the parameters for many times for each training to obtain the optimal parameters, so that a large amount of time is needed; the third type is to combine the two methods of the related filtering and the depth feature, but since the tracking speed is hard to reach the real-time, the method is rarely used in the system deployment.
In summary, the existing target tracking algorithm requires a lot of time for training, and has low target tracking accuracy and low speed.
Disclosure of Invention
In order to overcome the problems that the existing target tracking algorithm is complex in training, low in target tracking result precision and incapable of realizing real-time tracking or at least partially solve the problems, the invention provides a target tracking method and device based on meta-learning.
According to a first aspect of the present invention, there is provided a target tracking method based on meta learning, including:
performing online training on the improved tracking model by using meta-learning according to a first frame of a video to be tracked; wherein the improved tracking model is obtained by solving filter parameters in a correlation filtering algorithm using a convolutional neural network;
respectively extracting features of a search sample of a current frame and a search sample of the first frame in the video to be tracked based on the trained improved tracking model, and synthesizing the features corresponding to the current frame and the features corresponding to the first frame to obtain final features;
and performing regression processing on the final features based on a regression layer in the improved tracking model to obtain a response graph, and obtaining a target frame in the current frame according to the response graph.
According to a second aspect of the present invention, there is provided a target tracking apparatus based on meta learning, comprising:
the training module is used for performing online training on the improved tracking model by using meta-learning according to a first frame of a video to be tracked; wherein the improved tracking model is obtained by solving filter parameters in a correlation filtering algorithm using a convolutional neural network;
an extraction module, configured to perform feature extraction on a search sample of a current frame and a search sample of the first frame in the video to be tracked based on the trained improved tracking model, and synthesize features corresponding to the current frame and the first frame to obtain final features;
and the obtaining module is used for carrying out regression processing on the final characteristics based on a regression layer in the improved tracking model to obtain a response map, and obtaining the target frame in the current frame according to the response map.
According to a third aspect of the present invention, there is provided an electronic apparatus comprising:
at least one processor, at least one memory, and a bus; wherein the content of the first and second substances,
the processor and the memory complete mutual communication through the bus;
the memory stores program instructions executable by the processor, which when called by the processor are capable of performing the method as previously described.
According to a fourth aspect of the invention, there is provided a non-transitory computer readable storage medium storing computer instructions which cause the computer to perform the method as described above.
The invention provides a target tracking method and a device based on meta-learning, the method obtains an improved tracking model by replacing filter parameters in a related filtering algorithm with parameters of a convolutional neural network, the convolution neural network parameters comprise the information of the shape, the size and the like of an object, the improved tracking model is trained on line by using meta-learning to obtain optimized convolution neural network parameters, the trained convolution neural network parameters are used for carrying out regression on the target in the current frame to obtain a response graph, according to the method, the target frame of the current frame is obtained according to the response image, online updating is more efficient by using meta learning, parameters of the convolutional neural network are quickly adjusted by using the meta learning, the improved tracking model can quickly learn the shape information of the target, and the parameter updating speed and the target tracking accuracy in the target tracking process are improved.
Drawings
Fig. 1 is a schematic overall flow chart of a target tracking method based on meta-learning according to an embodiment of the present invention;
fig. 2 is a schematic view of an overall structure of a target tracking device based on meta learning according to an embodiment of the present invention;
fig. 3 is a schematic view of an overall structure of an electronic device according to an embodiment of the present invention.
Detailed Description
The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
In an embodiment of the present invention, a target tracking method based on meta-learning is provided, and fig. 1 is a schematic overall flow chart of the target tracking method based on meta-learning provided in the embodiment of the present invention, where the method includes: s101, performing online training on an improved tracking model by using meta-learning according to a first frame of a video to be tracked; wherein the improved tracking model is obtained by solving filter parameters in a correlation filtering algorithm by using a convolutional neural network;
the video to be tracked is the video needing target tracking. Meta learning is simply to let a machine learn. The on-line training refers to training performed after the improved tracking model is deployed, and in this embodiment, after the target and the target frame in the first frame are determined, parameters of the improved tracking model are trained. And the target in the first frame is obtained by performing target detection on the first frame. After the target in the first frame is obtained, the target frame in the first frame is determined according to the area of the target in the first frame. In the target tracking algorithm, the related filtering algorithm gives consideration to speed and performance, and benefits from the fast and efficient matching capability. The correlation filtering algorithm is trained based on ridge regression, filter parameters are obtained as a training target, and the position of a target frame is obtained through the filter parameters, wherein the formula is as follows:
W*=argmin||W*X-Y||2+λ||W||2
wherein, W is a filter parameter, X is an image of an input correlation filtering algorithm, Y is a correlation diagram, which reflects the correlation between the filter and the image of the input correlation filtering algorithm, and lambda is a penalty coefficient. In the embodiment, the improved tracking model is obtained by solving the filter parameters in the relevant filtering algorithm by using the convolutional neural network, the filter parameters in the relevant filtering algorithm are replaced by using the solved result, and the replaced result is used as the improved tracking model, wherein the improved tracking model comprises the convolutional neural network parameters but does not comprise the filter parameters, so that the convolutional neural network is introduced.
S102, respectively extracting features of a search sample of a current frame and a search sample of a first frame in a video to be tracked based on a trained improved tracking model, and synthesizing the features corresponding to the current frame and the features corresponding to the first frame to obtain final features;
the search sample in the first frame is the search sample in the first frame, wherein the search sample in the first frame is centered on the target in the first frame, and after the target frame in the first frame is amplified by a preset multiple, the area in the amplified target frame in the first frame is used as the search sample in the first frame. Such as a preset multiple of 5. The search sample of the current frame is the region within the enlarged target frame in the current frame. In the prediction process, the concept of time residual is introduced, and besides the search sample of the current frame is processed and input, the search sample of the first frame is also used as input. And performing feature extraction on the search sample of the current frame and the search sample of the first frame by adopting a trained improved tracking model, namely the same residual error network. The search sample of the current frame is input into a residual error neural network, and the search sample of the first frame is input into a common convolution neural network. The method comprises the steps of performing feature fusion, namely feature splicing, on features of a current frame search sample and a first frame search sample input into a network, wherein the process is called time residual, so that the robustness of the features is stronger. And finally, obtaining a response graph according to the fused features.
And S103, performing regression processing on the final features based on a regression layer in the improved tracking model to obtain a response map, and obtaining a target frame in the current frame according to the response map.
And (4) performing regression processing on the final characteristics through a regression layer in the improved tracking model to obtain a final response diagram. And acquiring a target frame in the current frame according to the response image.
In the embodiment, the filter parameters in the related filtering algorithm are replaced by the convolutional neural network parameters to obtain the improved tracking model, so that the convolutional neural network parameters contain information of the shape, the size and the like of an object, the improved tracking model is trained online by using meta-learning to obtain the optimized convolutional neural network parameters, the trained convolutional neural network parameters are used for regressing the target in the current frame to obtain the response diagram, and the target frame of the current frame is obtained according to the response diagram.
On the basis of the foregoing embodiment, in this embodiment, the step of performing online training on an improved tracking model by using meta-learning according to a first frame of a video to be tracked specifically includes: taking a target in a first frame of a video to be tracked as a center, amplifying a target frame in the first frame by a preset multiple, and taking an area in the amplified target frame as a training sample; and updating the parameters of the convolutional neural network and the learning rate in the improved tracking model by using a gradient descent mode according to the training samples.
In this embodiment, an on-line training method is adopted, and a target frame in a first frame is enlarged by a preset multiple, for example, 5 times, with a target in the first frame as a center, to generate a training sample. And updating the parameters of the convolutional neural network and the learning rate in the improved tracking model by using a gradient descent mode according to the training samples. In target tracking, meta-learning mainly has two functions, namely, gradient updating is faster and more efficient through meta-learning in a first frame; and secondly, the robustness of the trained model is higher through meta-learning. In meta-learning, the way the gradient decreases can be expressed as:
Figure BDA0001750082570000051
wherein M is a function name, theta is a convolutional neural network parameter, alpha is a learning rate,
Figure BDA0001750082570000052
is the gradient of the objective function F. The goal of meta-learning is to find a good convolutional neural network parameters theta and alpha for gradient updating. Theta0For the convolutional neural network parameter, θ, obtained by the training samples of the first frame0And α is derived as follows:
Figure BDA0001750082570000061
wherein E is the expectation, S is the set of all training samples, j is the number of the frame, delta is the interval between the current frame and the standard frame, the standard frame is generally the 0 th frame, y is the label,
Figure BDA0001750082570000062
is the output matrix of the sample after passing through the convolutional neural network, and L is the loss function.
On the basis of the above embodiment, in this embodiment, according to the first frame of the video to be tracked, the formula of the objective function F (θ) trained on the improved tracking model on line by using meta-learning is as follows:
Figure BDA0001750082570000063
where N is the number of training samples, L2Is a loss function of two norms, X is a training sample, lambda is a penalty coefficient, theta is a convolutional neural network parameter, i is a training sample number, and r is a regularization functionAnd (4) counting.
In this embodiment, λ is a penalty coefficient to prevent overfitting. r is a function representing some regularization mode to prevent overfitting. The specific form of this function is generally a two-normal form, i.e., the square of | θ. The convolutional neural network is introduced to realize end-to-end learning of filter parameters, namely, the final target corresponding graph is obtained in one step from parameter training, feedforward and feedback of each part can be simultaneously carried out in the training process, each convolutional neural network parameter is trained together, and the training results of all convolutional neural network parameters are obtained at one time.
On the basis of the foregoing embodiment, in this embodiment, the step of performing online training on the improved tracking model by using meta learning according to the first frame of the video to be tracked further includes: selecting a frame from a video to be tracked at intervals of a preset frame number; the convolutional neural network parameters in the improved tracking model are updated using meta-learning according to the selected frame.
In the tracking process, the embodiment continuously uses new training samples to update the parameters of the convolutional neural network, so that the training accuracy is improved. Specifically, each preset frame number selects one frame from the video to be tracked, the area in the sample frame in the selected frame is used as a new training sample, and the parameters of the convolutional neural network are updated according to the new training sample. The updated formula is:
Figure BDA0001750082570000071
where M is the name of the function for gradient descent, L is the loss function, y is the label, j is the number of the frame, x is the sample, and F is the mapping represented by the convolutional neural network. The update method also uses a gradient descent approach.
On the basis of the foregoing embodiments, in this embodiment, the step of synthesizing the feature corresponding to the current frame and the feature corresponding to the first frame to obtain the final feature specifically includes: and multiplying the corresponding characteristics of the current frame and the first frame by corresponding weights respectively and then adding the obtained products to obtain the final characteristics.
On the basis of the foregoing embodiments, in this embodiment, the step of acquiring the target frame in the current frame according to the response map specifically includes: multiplying the response graph by a Gaussian window with center attenuation to obtain a score matrix; taking the position corresponding to the maximum value in the scoring matrix as the position of the central point of the target in the current frame; and acquiring a target frame in the current frame according to the position of the central point.
Specifically, the obtained final Response Map (Response Map) is multiplied by a gaussian window with attenuated center to obtain a final score matrix, and a position corresponding to a maximum value in the score matrix is the center point position of the target in the current frame. And acquiring a target frame in the current frame according to the position of the central point.
On the basis of the foregoing embodiment, in this embodiment, the step of acquiring the target frame in the current frame according to the central point position specifically includes: multiplying the size of a target frame in the previous frame of the current frame by multiple preset search scales respectively to obtain multiple search ranges; respectively acquiring the maximum value in each search range in the score matrix by taking the central point position as the center; and determining the coordinates of the target frame in the current frame according to the searching range and the central point position corresponding to the maximum value in the maximum values in each searching range.
Wherein the search scale is a preset constant. The search range is a frame, and the size of the search range is determined according to the size of the target frame in the previous frame of the current frame and the search scale. For example, the three search scales are 1 time, 1.03 time and 0.97 time of the target frame in the previous frame of the current frame. And respectively multiplying the length and the width of the target frame in the previous frame of the current frame by 1 to obtain a first search range. And respectively multiplying the length and the width of the target frame in the previous frame of the current frame by 1.03 to obtain a second search range. And multiplying the length and the width of the target frame in the previous frame of the current frame by 0.97 respectively to obtain a third search range. And respectively obtaining the maximum values in the three search ranges in the score matrix, and then taking the search range corresponding to the maximum value in the maximum values as the size of the target frame in the current frame. And determining the coordinates of the target frame in the current frame by taking the central point position of the target in the current frame as the center according to the size of the target frame in the current frame.
In another embodiment of the present invention, a target tracking device based on meta-learning is provided, which is used to implement the methods in the foregoing embodiments. Therefore, the description and definition in the target tracking method based on meta learning in the foregoing embodiments can be used for understanding of the execution modules in the embodiments of the present invention. Fig. 2 is a schematic diagram of an overall structure of a target tracking apparatus based on meta-learning according to an embodiment of the present invention, where the apparatus includes a training module 201, an extraction module 202, and an acquisition module 203; wherein:
the training module 201 is configured to perform online training on the improved tracking model by using meta-learning according to a first frame of a video to be tracked; wherein the improved tracking model is obtained by solving filter parameters in a correlation filtering algorithm by using a convolutional neural network; the extraction module 202 is configured to perform feature extraction on a search sample of a current frame and a search sample of a first frame in a video to be tracked based on a trained improved tracking model, and synthesize features corresponding to the current frame and features corresponding to the first frame to obtain final features; the obtaining module 203 is configured to perform regression processing on the final features based on a regression layer in the improved tracking model to obtain a response map, and obtain a target frame in the current frame according to the response map.
On the basis of the above embodiment, the training module in this embodiment is specifically configured to: taking a target in a first frame of a video to be tracked as a center, amplifying a target frame in the first frame by a preset multiple, and taking an area in the amplified target frame as a training sample; and updating the parameters of the convolutional neural network and the learning rate in the improved tracking model by using a gradient descent mode according to the training samples.
On the basis of the above embodiment, the formula of the objective function F (θ) for the training module to perform online training on the improved tracking model by using meta-learning according to the first frame of the video to be tracked in this embodiment is as follows:
Figure BDA0001750082570000091
wherein N isNumber of training samples, L2The method is a loss function of two norms, X is a training sample, lambda is a penalty coefficient, theta is a convolutional neural network parameter, i is a training sample number, and r is a regularization function.
On the basis of the above embodiment, the training module in this embodiment is further configured to: selecting a frame from a video to be tracked at intervals of a preset frame number; the convolutional neural network parameters in the improved tracking model are updated using meta-learning according to the selected frame.
On the basis of the foregoing embodiments, the extraction module in this embodiment is specifically configured to: and multiplying the corresponding characteristics of the current frame and the first frame by corresponding weights respectively and then adding the obtained products to obtain the final characteristics.
On the basis of the foregoing embodiments, the obtaining module in this embodiment is specifically configured to: multiplying the response graph by a Gaussian window with center attenuation to obtain a score matrix; taking the position corresponding to the maximum value in the scoring matrix as the position of the central point of the target in the current frame; and acquiring a target frame in the current frame according to the position of the central point.
On the basis of the foregoing embodiment, the obtaining module in this embodiment is further configured to: multiplying the size of a target frame in the previous frame of the current frame by multiple preset search scales respectively to obtain multiple search ranges; respectively acquiring the maximum value in each search range in the score matrix by taking the central point position as the center; and determining the coordinates of the target frame in the current frame according to the searching range and the central point position corresponding to the maximum value in the maximum values in each searching range.
In the embodiment, the filter parameters in the related filtering algorithm are replaced by the convolutional neural network parameters to obtain the improved tracking model, so that the convolutional neural network parameters contain information such as the shape and the size of an object, the improved tracking model is trained online by using meta-learning to obtain the optimized convolutional neural network parameters, the trained convolutional neural network parameters are used for regressing a target in the current frame to obtain a response diagram, and a target frame of the current frame is obtained according to the response diagram.
The embodiment provides an electronic device, and fig. 3 is a schematic view of an overall structure of the electronic device according to the embodiment of the present invention, where the electronic device includes: at least one processor 301, at least one memory 302, and a bus 303; wherein the content of the first and second substances,
the processor 301 and the memory 302 are communicated with each other through a bus 303;
the memory 302 stores program instructions executable by the processor 301, and the processor calls the program instructions to perform the methods provided by the above method embodiments, for example, the method includes: performing online training on the improved tracking model by using meta-learning according to a first frame of a video to be tracked; wherein the improved tracking model is obtained by solving filter parameters in a correlation filtering algorithm by using a convolutional neural network; respectively extracting features of a search sample of a current frame and a search sample of a first frame in a video to be tracked based on a trained improved tracking model, and synthesizing the features corresponding to the current frame and the first frame to obtain final features; and performing regression processing on the final features based on a regression layer in the improved tracking model to obtain a response graph, and obtaining a target frame in the current frame according to the response graph.
The present embodiments provide a non-transitory computer-readable storage medium storing computer instructions that cause a computer to perform the methods provided by the above method embodiments, for example, including: performing online training on the improved tracking model by using meta-learning according to a first frame of a video to be tracked; wherein the improved tracking model is obtained by solving filter parameters in a correlation filtering algorithm by using a convolutional neural network; respectively extracting features of a search sample of a current frame and a search sample of a first frame in a video to be tracked based on a trained improved tracking model, and synthesizing the features corresponding to the current frame and the first frame to obtain final features; and performing regression processing on the final features based on a regression layer in the improved tracking model to obtain a response graph, and obtaining a target frame in the current frame according to the response graph.
Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
The above-described embodiments of the electronic device are merely illustrative, and units illustrated as separate components may or may not be physically separate, and components displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods of the various embodiments or some parts of the embodiments.
Finally, the method of the present application is only a preferred embodiment and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (8)

1. A target tracking method based on meta-learning is characterized by comprising the following steps:
performing online training on the improved tracking model by using meta-learning according to a first frame of a video to be tracked; the improved tracking model is obtained by solving filter parameters in a related filtering algorithm by using a convolutional neural network, replacing the filter parameters in the related filtering algorithm by using the solved result, and taking the replaced result as the improved tracking model; the online training aims to obtain optimized parameters of the convolutional neural network;
respectively extracting features of a search sample of a current frame and a search sample of the first frame in the video to be tracked based on the trained improved tracking model, and synthesizing the features corresponding to the current frame and the features corresponding to the first frame to obtain final features;
performing regression processing on the final features based on a regression layer in the improved tracking model to obtain a response map, and obtaining a target frame in the current frame according to the response map;
the step of performing online training on the improved tracking model by using meta-learning according to the first frame of the video to be tracked specifically comprises:
taking a target in a first frame of a video to be tracked as a center, amplifying a target frame in the first frame by a preset multiple, and taking an area in the amplified target frame as a training sample;
updating the parameters and the learning rate of the convolutional neural network in the improved tracking model by using a gradient descent mode according to the training samples;
the formula of an objective function F (theta) for performing online training on an improved tracking model by using meta-learning according to a first frame of a video to be tracked is as follows:
Figure FDA0002714870470000011
where N is the number of training samples, L2Is a second rangeAnd (3) a loss function of number, wherein X is the training sample, lambda is a penalty coefficient, theta is a convolutional neural network parameter, i is the training sample number, and r is a regularization function.
2. The method of claim 1, wherein the step of on-line training the improved tracking model using meta-learning based on the first frame of the video to be tracked further comprises:
selecting a frame from the video to be tracked at intervals of a preset frame number;
and updating the parameters of the convolutional neural network in the improved tracking model by using meta-learning according to the selected frame.
3. The method according to claim 1 or 2, wherein the step of synthesizing the features corresponding to the current frame and the features corresponding to the first frame to obtain the final features specifically comprises:
and multiplying the features corresponding to the current frame and the features corresponding to the first frame by corresponding weights respectively and then adding the multiplied features to obtain final features.
4. The method according to claim 1 or 2, wherein the step of obtaining the target frame in the current frame according to the response map specifically comprises:
multiplying the response graph by a Gaussian window with central attenuation to obtain a score matrix;
taking the position corresponding to the maximum value in the scoring matrix as the position of the central point of the target in the current frame;
and acquiring a target frame in the current frame according to the position of the central point.
5. The method according to claim 4, wherein the step of obtaining the target frame in the current frame according to the position of the center point specifically comprises:
multiplying the size of the target frame in the previous frame of the current frame by multiple preset search scales respectively to obtain multiple search ranges;
respectively acquiring the maximum value in each search range in the score matrix by taking the central point position as the center;
and determining the coordinates of the target frame in the current frame according to the searching range corresponding to the maximum value of the maximum values in the searching ranges and the position of the central point.
6. A target tracking device based on meta-learning, comprising:
the training module is used for performing online training on the improved tracking model by using meta-learning according to a first frame of a video to be tracked; the improved tracking model is obtained by solving filter parameters in a related filtering algorithm by using a convolutional neural network, replacing the filter parameters in the related filtering algorithm by using the solved result, and taking the replaced result as the improved tracking model; the online training aims to obtain optimized parameters of the convolutional neural network;
an extraction module, configured to perform feature extraction on a search sample of a current frame and a search sample of the first frame in the video to be tracked based on the trained improved tracking model, and synthesize features corresponding to the current frame and the first frame to obtain final features;
an obtaining module, configured to perform regression processing on the final feature based on a regression layer in the improved tracking model to obtain a response map, and obtain a target frame in the current frame according to the response map;
wherein the training module is specifically configured to:
taking a target in a first frame of a video to be tracked as a center, amplifying a target frame in the first frame by a preset multiple, and taking an area in the amplified target frame as a training sample;
updating the parameters and the learning rate of the convolutional neural network in the improved tracking model by using a gradient descent mode according to the training sample;
the training module performs on-line training on the improved tracking model by using meta-learning according to a first frame of a video to be tracked, wherein a formula of an objective function F (theta) of the on-line training on the improved tracking model by using meta-learning is as follows:
Figure FDA0002714870470000031
where N is the number of training samples, L2The method is a loss function of two norms, X is a training sample, lambda is a penalty coefficient, theta is a convolutional neural network parameter, i is a training sample number, and r is a regularization function.
7. An electronic device, comprising:
at least one processor, at least one memory, and a bus; wherein the content of the first and second substances,
the processor and the memory complete mutual communication through the bus;
the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of any of claims 1 to 5.
8. A non-transitory computer-readable storage medium storing computer instructions that cause a computer to perform the method of any one of claims 1 to 5.
CN201810862625.7A 2018-08-01 2018-08-01 Target tracking method and device based on meta-learning Active CN109064493B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810862625.7A CN109064493B (en) 2018-08-01 2018-08-01 Target tracking method and device based on meta-learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810862625.7A CN109064493B (en) 2018-08-01 2018-08-01 Target tracking method and device based on meta-learning

Publications (2)

Publication Number Publication Date
CN109064493A CN109064493A (en) 2018-12-21
CN109064493B true CN109064493B (en) 2021-03-09

Family

ID=64832494

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810862625.7A Active CN109064493B (en) 2018-08-01 2018-08-01 Target tracking method and device based on meta-learning

Country Status (1)

Country Link
CN (1) CN109064493B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109885728B (en) * 2019-01-16 2022-06-07 西北工业大学 Video abstraction method based on meta-learning
CN110070226B (en) * 2019-04-24 2020-06-16 河海大学 Photovoltaic power prediction method and system based on convolutional neural network and meta-learning
CN111199189A (en) * 2019-12-18 2020-05-26 中国科学院上海微***与信息技术研究所 Target object tracking method and system, electronic equipment and storage medium
CN113129360B (en) * 2019-12-31 2024-03-08 抖音视界有限公司 Method and device for positioning object in video, readable medium and electronic equipment
CN114025190B (en) * 2021-11-03 2023-06-20 北京达佳互联信息技术有限公司 Multi-code rate scheduling method and multi-code rate scheduling device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106887011A (en) * 2017-01-20 2017-06-23 北京理工大学 A kind of multi-template method for tracking target based on CNN and CF
CN107590820A (en) * 2017-08-25 2018-01-16 北京飞搜科技有限公司 A kind of object video method for tracing and its intelligent apparatus based on correlation filtering
CN107993250A (en) * 2017-09-12 2018-05-04 北京飞搜科技有限公司 A kind of fast multi-target pedestrian tracking and analysis method and its intelligent apparatus

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106295678B (en) * 2016-07-27 2020-03-06 北京旷视科技有限公司 Neural network training and constructing method and device and target detection method and device
CN106530340B (en) * 2016-10-24 2019-04-26 深圳市商汤科技有限公司 A kind of specified object tracking
CN106709936A (en) * 2016-12-14 2017-05-24 北京工业大学 Single target tracking method based on convolution neural network
CN108038435B (en) * 2017-12-04 2022-01-04 中山大学 Feature extraction and target tracking method based on convolutional neural network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106887011A (en) * 2017-01-20 2017-06-23 北京理工大学 A kind of multi-template method for tracking target based on CNN and CF
CN107590820A (en) * 2017-08-25 2018-01-16 北京飞搜科技有限公司 A kind of object video method for tracing and its intelligent apparatus based on correlation filtering
CN107993250A (en) * 2017-09-12 2018-05-04 北京飞搜科技有限公司 A kind of fast multi-target pedestrian tracking and analysis method and its intelligent apparatus

Also Published As

Publication number Publication date
CN109064493A (en) 2018-12-21

Similar Documents

Publication Publication Date Title
CN109064493B (en) Target tracking method and device based on meta-learning
CN110619655B (en) Target tracking method and device integrating optical flow information and Simese framework
Park et al. Ship trajectory prediction based on bi-LSTM using spectral-clustered AIS data
CN107369166B (en) Target tracking method and system based on multi-resolution neural network
CN107146237B (en) Target tracking method based on online state learning and estimation
CN114202076B (en) Training method of deep learning model, natural language processing method and device
CN110136162B (en) Unmanned aerial vehicle visual angle remote sensing target tracking method and device
Wang et al. Underwater object detection method based on improved faster RCNN
CN113361710A (en) Student model training method, picture processing device and electronic equipment
Kalidas et al. Deep reinforcement learning for vision-based navigation of UAVs in avoiding stationary and mobile obstacles
CN113112525A (en) Target tracking method, network model, and training method, device, and medium thereof
JP2020534609A (en) Target tracking methods and devices, electronic devices and storage media
KR102128789B1 (en) Method and apparatus for providing efficient dilated convolution technique for deep convolutional neural network
Park et al. Deep reinforcement learning-based DQN agent algorithm for visual object tracking in a virtual environmental simulation
Zhang et al. Learning future-aware correlation filters for efficient UAV tracking
Mehmood et al. Efficient online object tracking scheme for challenging scenarios
Ortner et al. Augmented air traffic control system—artificial intelligence as digital assistance system to predict air traffic conflicts
Lin et al. Robust correlation tracking for UAV with feature integration and response map enhancement
Tan et al. The research of air combat intention identification method based on bilstm+ attention
Yang et al. Long-term target tracking of UAVs based on kernelized correlation filter
Cui et al. Point siamese network for person tracking using 3D point clouds
Kim et al. Global motion-aware robust visual object tracking for electro optical targeting systems
Yang et al. Particle filter based on Harris hawks optimization algorithm for underwater visual tracking
Zhuo et al. MultiRPN-DIDnet: multiple RPNs and distance-IoU discriminative network for real-time UAV target tracking
CN113837977A (en) Object tracking method, multi-target tracking model training method and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20210204

Address after: 215000 unit 2-b702, creative industry park, 328 Xinghu street, Suzhou Industrial Park, Suzhou City, Jiangsu Province

Applicant after: SUZHOU FEISOU TECHNOLOGY Co.,Ltd.

Address before: Room 1216, 12 / F, Beijing Beiyou science and technology and cultural exchange center, 10 Xitucheng Road, Haidian District, Beijing, 100876

Applicant before: BEIJING FEISOU TECHNOLOGY Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant