CN114743150A

CN114743150A - Target tracking method and device, electronic equipment and storage medium

Info

Publication number: CN114743150A
Application number: CN202210493990.1A
Authority: CN
Inventors: 李福林; 陈翀; 徐宁; 戴宇荣
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2022-04-29
Filing date: 2022-04-29
Publication date: 2022-07-12

Abstract

The present disclosure relates to a target tracking method, an apparatus, an electronic device, and a storage medium, wherein the target tracking method includes: acquiring a scale set and an angle set of a target to be tracked in a video sequence, wherein the scale set comprises one or more scale parameters for measuring the scale change degree of the target to be tracked, and the angle set comprises one or more angle parameters for measuring the angle change degree of the target to be tracked; extracting features of the current video frame according to one or more scale angle parameter combinations determined by the scale set and the angle set to obtain one or more feature maps under different scale angle parameter combinations; and obtaining the scale and the angle of the target to be tracked in the current video frame according to one or more feature maps, the feature maps of a preset number of video frames before the current video frame, the scale set and the angle set. According to the target tracking method, the target tracking device, the electronic equipment and the storage medium, the target tracking accuracy can be improved.

Description

Target tracking method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer vision technologies, and in particular, to a target tracking method and apparatus, an electronic device, and a storage medium.

Background

Video target tracking is an important research direction in the field of computer vision, the application of the video target tracking comprises video monitoring, detection and recognition, man-machine interaction, unmanned driving and the like, and the video target tracking can also be used as one of video editing tools and is widely applied to daily production and life.

Currently, most video target tracking methods focus on tracking target displacement (i.e., tracking position change of a specific target in a picture), but due to the fact that the target itself changes in scale, morphological differences and the like, tracking target displacement alone may cause target loss, and accuracy of video target tracking is low.

Disclosure of Invention

The present disclosure provides a target tracking method, an apparatus, an electronic device, and a storage medium, to at least solve the above-mentioned problems in the related art.

According to a first aspect of the embodiments of the present disclosure, there is provided a target tracking method, including: acquiring a scale set and an angle set of a target to be tracked in a video sequence, wherein the scale set comprises one or more scale parameters for measuring the scale change degree of the target to be tracked, and the angle set comprises one or more angle parameters for measuring the angle change degree of the target to be tracked; extracting features of a current video frame according to one or more scale angle parameter combinations determined by the scale set and the angle set to obtain one or more feature maps of the current video frame under different scale angle parameter combinations; and obtaining the scale and the angle of the target to be tracked in the current video frame according to the one or more feature maps, the feature maps of a preset number of video frames before the current video frame, the scale set and the angle set.

Optionally, the extracting features from the current video frame according to the one or more scale angle parameter combinations determined by the scale set and the angle set to obtain one or more feature maps of the current video frame under different scale angle parameter combinations includes: adjusting the scale angle of the target to be tracked in the previous video frame of the current video frame according to each scale angle parameter combination in the one or more scale angle parameter combinations to obtain the adjusted scale angle of the one or more targets to be tracked in the current video frame; and performing feature extraction on the current video frame comprising the target to be tracked with the adjusted scale angle to obtain the one or more feature maps.

Optionally, the obtaining, according to the one or more feature maps, the feature maps of a preset number of video frames before the current video frame, the scale set, and the angle set, the scale and the angle of the target to be tracked in the current video frame includes: acquiring a first response according to the feature maps, the scale set and the angle set of a preset number of video frames before the current video frame; acquiring one or more second responses according to the one or more feature maps and the first response, wherein each second response reflects the closeness degree of the adjusted scale angle of the target to be tracked in the current video frame and a real scale angle; and determining the adjusted scale angle corresponding to the second response meeting the preset condition in the one or more second responses as the scale and the angle of the target to be tracked in the current video frame.

Optionally, the obtaining a first response according to the feature maps, the scale set, and the angle set of a preset number of video frames before the current video frame includes: respectively acquiring a scale Gaussian response and an angle Gaussian response according to the number of the scale parameters in the scale set and the number of the angle parameters in the angle set; and acquiring the first response based on the scale Gaussian response, the angle Gaussian response and the feature maps of the first two video frames of the current video frame.

Optionally, the feature maps of the first two video frames of the current video frame include a first feature map and a second feature map of a first video frame before the current video frame and a third feature map of a second video frame before the current video frame; the obtaining the first response based on the scale gaussian response, the angle gaussian response and the feature maps of the first two video frames of the current video frame includes: performing fast Fourier transform on the scale Gaussian response, the angle Gaussian response, the first feature map, the second feature map and the third feature map to obtain the scale Gaussian response, the angle Gaussian response, the first feature map, the second feature map and the third feature map of a frequency domain; correlating the scale Gaussian response of the frequency domain with the angle Gaussian response of the frequency domain to obtain a scale angle comprehensive Gaussian response of the frequency domain; and acquiring the first response based on the first characteristic diagram, the second characteristic diagram, the third characteristic diagram and the scale angle comprehensive Gaussian response of the frequency domain.

Optionally, the obtaining the first response based on the first feature map, the second feature map, the third feature map and the scale angle comprehensive gaussian response of the frequency domain includes: associating the second characteristic diagram of the frequency domain and the third characteristic diagram of the frequency domain with a scale angle comprehensive Gaussian response of the frequency domain respectively to obtain a target second characteristic diagram of the frequency domain and a target third characteristic diagram of the frequency domain, wherein the target second characteristic diagram of the frequency domain and the target third characteristic diagram of the frequency domain carry scale angle information respectively; and obtaining the first response according to the target second characteristic diagram of the frequency domain, the target third characteristic diagram of the frequency domain and the first characteristic diagram of the frequency domain.

Optionally, the obtaining one or more second responses according to the one or more feature maps and the first response includes: performing fast Fourier transform on the one or more characteristic maps to obtain one or more characteristic maps of a frequency domain; and obtaining the one or more second responses according to each of the one or more characteristic graphs of the frequency domain and the first response.

Optionally, after determining, as the scale and the angle of the target to be tracked in the current video frame, the adjusted scale angle corresponding to a second response meeting a preset condition in the one or more second responses, the method further includes: and updating the first response according to the feature map of the current video frame and the feature map of the video frame before the current video frame.

According to a second aspect of the embodiments of the present disclosure, there is provided a target tracking apparatus including: a set acquisition unit configured to: acquiring a scale set and an angle set of a target to be tracked in a video sequence, wherein the scale set comprises one or more scale parameters for measuring the scale change degree of the target to be tracked, and the angle set comprises one or more angle parameters for measuring the angle change degree of the target to be tracked; a feature extraction unit configured to: extracting features of a current video frame according to one or more scale angle parameter combinations determined by the scale set and the angle set to obtain one or more feature maps of the current video frame under different scale angle parameter combinations; a scale and angle acquisition unit configured to: and obtaining the scale and the angle of the target to be tracked in the current video frame according to the one or more feature maps, the feature maps of a preset number of video frames before the current video frame, the scale set and the angle set.

Optionally, the feature extraction unit may be configured to adjust a scale angle of the target to be tracked in a video frame previous to the current video frame according to each scale angle parameter combination of the one or more scale angle parameter combinations, to obtain an adjusted scale angle of the one or more targets to be tracked in the current video frame; and performing feature extraction on the current video frame comprising the target to be tracked with the adjusted scale angle to obtain the one or more feature maps.

Optionally, the scale and angle obtaining unit may be configured to obtain a first response according to feature maps of a preset number of video frames before the current video frame, the scale set, and the angle set; acquiring one or more second responses according to the one or more feature maps and the first response, wherein each second response reflects the closeness degree of the adjusted scale angle of the target to be tracked in the current video frame and a real scale angle; and determining the adjusted scale angle corresponding to the second response meeting the preset condition in the one or more second responses as the scale and the angle of the target to be tracked in the current video frame.

Optionally, the scale and angle obtaining unit may be configured to obtain a scale gaussian response and an angle gaussian response according to the number of scale parameters in the scale set and the number of angle parameters in the angle set, respectively; and acquiring the first response based on the scale Gaussian response, the angle Gaussian response and the feature maps of the first two video frames of the current video frame.

Optionally, the feature maps of the first two video frames of the current video frame include a first feature map and a second feature map of a first video frame before the current video frame and a third feature map of a second video frame before the current video frame; the scale and angle acquisition unit may be configured to perform fast fourier transform on the scale gaussian response, the angle gaussian response, the first feature map, the second feature map, and the third feature map to obtain a scale gaussian response, an angle gaussian response, a first feature map, a second feature map, and a third feature map in a frequency domain; correlating the scale Gaussian response of the frequency domain with the angle Gaussian response of the frequency domain to obtain a scale angle comprehensive Gaussian response of the frequency domain; and acquiring the first response based on the first characteristic diagram, the second characteristic diagram, the third characteristic diagram and the scale angle comprehensive Gaussian response of the frequency domain.

Optionally, the scale and angle obtaining unit may be configured to associate the second feature map of the frequency domain and the third feature map of the frequency domain with a scale and angle comprehensive gaussian response of the frequency domain, respectively, to obtain a target second feature map of the frequency domain and a target third feature map of the frequency domain, where the target second feature map of the frequency domain and the target third feature map of the frequency domain carry scale and angle information, respectively; and obtaining the first response according to the target second characteristic diagram of the frequency domain, the target third characteristic diagram of the frequency domain and the first characteristic diagram of the frequency domain.

Optionally, the scale and angle obtaining unit may be configured to perform a fast fourier transform on the one or more feature maps, resulting in one or more feature maps in a frequency domain; and obtaining the one or more second responses according to each characteristic diagram in the one or more characteristic diagrams of the frequency domain and the first response.

Optionally, the target tracking device further includes a first response updating unit, and the first response updating unit may be configured to, after determining, as the scale and the angle of the target to be tracked in the current video frame, the adjusted scale angle corresponding to a second response meeting a preset condition in the one or more second responses, update the first response according to the feature map of the current video frame and the feature map of a video frame before the current video frame.

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including: at least one processor; at least one memory storing computer-executable instructions, wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform a target tracking method according to the present disclosure.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform a target tracking method according to the present disclosure.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product, instructions in which are executable by a processor of a computer device to perform a target tracking method according to the present disclosure.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

according to the target tracking method, the target tracking device, the electronic equipment and the storage medium, feature extraction is carried out on a current video frame according to a scale set and an angle set of a target to be tracked in a video sequence, one or more feature maps of the current video frame under different scale and angle parameter combinations can be obtained, and the scale and the angle of the target in the current video frame can be tracked according to the obtained one or more feature maps, the feature map, the scale set and the angle set of the video frame before the current video frame, so that the accuracy of video target tracking is improved.

In addition, according to the target tracking method, the target tracking device, the electronic equipment and the storage medium disclosed by the invention, any video target displacement tracking method can be conveniently superposed, and four-dimensional tracking (namely two-dimensional displacement, one-dimensional size and one-dimensional angular posture) of the target is realized.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

Fig. 1 is a flowchart illustrating a target tracking method according to an exemplary embodiment of the present disclosure.

Fig. 2 is a schematic diagram illustrating a video target tracking process in a specific application scenario according to an exemplary embodiment of the present disclosure.

Fig. 3 is a diagram illustrating a tracking result of a target tracking method according to an exemplary embodiment of the present disclosure.

FIG. 4 is a block diagram illustrating a target tracking device according to an exemplary embodiment of the present disclosure.

Fig. 5 is a block diagram of an electronic device 500 according to an example embodiment of the present disclosure.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The embodiments described in the following examples do not represent all embodiments consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

In this case, the expression "at least one of the items" in the present disclosure means a case where three types of parallel expressions "any one of the items", "a combination of any plural ones of the items", and "the entirety of the items" are included. For example, "include at least one of a and B" includes the following three cases in parallel: (1) comprises A; (2) comprises a component B; (3) including a and B. For another example, "at least one of the first step and the second step is performed", which means that the following three cases are juxtaposed: (1) executing the step one; (2) executing the step two; (3) and executing the step one and the step two.

In order to improve the accuracy and comprehensiveness of video target tracking, the present disclosure provides a target tracking method, an apparatus, an electronic device, and a storage medium, and specifically, feature extraction is performed on a current video frame according to a scale set and an angle set of a target to be tracked in a video sequence, so as to obtain one or more feature maps of the current video frame under different scale and angle parameter combinations, and according to the obtained one or more feature maps, the feature map, the scale set, and the angle set of the video frame before the current video frame, the scale and the angle of the target in the current video frame can be tracked, thereby improving the accuracy of video target tracking. In addition, any video target displacement tracking method can be conveniently superposed on the target tracking method disclosed by the disclosure, and four-dimensional tracking (namely two-dimensional displacement, one-dimensional size and one-dimensional angular posture) of the target is realized. Hereinafter, a target tracking method, apparatus, electronic device, and storage medium according to exemplary embodiments of the present disclosure will be described in detail with reference to fig. 1 to 5.

It should be noted that, the target tracking method shown in the embodiment of the present disclosure may be used for tracking a video target, in one embodiment, the target tracking method shown in the present disclosure may be executed by a computer, a notebook computer, a smart phone, a tablet computer, a wearable device, a vehicle-mounted device, and the like, and in another embodiment, the target tracking method shown in the present disclosure may also be executed by a chip with a computing capability, or, in response to a request for tracking a video target sent by a user end, executed by a server, a server cluster, a distributed subsystem, a cloud processing platform, a server including a block chain node, and a device combining the servers, and the present disclosure does not limit specific execution devices.

Referring to fig. 1, in step 101, a scale set and an angle set of a target to be tracked in a video sequence are obtained, where the scale set includes one or more scale parameters measuring a scale change degree of the target to be tracked, and the angle set includes one or more angle parameters measuring an angle change degree of the target to be tracked. Here, the video sequence includes a plurality of video frames, and the target to be tracked is a moving object in the video sequence, and therefore, the position, the scale, and the angle of the target to be tracked in different video frames may be different.

According to an example embodiment of the present disclosure, a set of scales may be represented as S ═ S₁,s₂,…,s_nWhere n is the size of the scale set, s₁,s₂,…,s_nRespectively representing different scale parameters for measuring the scale change degree of the target to be tracked in the video sequence, and in some embodiments, setting the scale parameters to be differentThe scale setting step is p, the trackable scale size range is [ p × s ]₁,p*s_n]For example, for a scale set S ═ 0.6, { 0.7, 08, 0.9, 1, 1.1, 1.2, 1.3, 1.4}, where 0.6 indicates that the scale of the target to be tracked is reduced to 60%, 1 indicates that the scale of the target to be tracked is unchanged, and where the scale step p is 0.5, the trackable scale size range is [0.3, 0.7]Thus, the trackable scale size range can be adjusted by altering the scale step size without having to reset the scale set. The angle set may be represented as R ═ R₁,r₂,…,r_mM is the size of the angle set, r₁,r₂,…,r_mRespectively representing different angle parameters for measuring the angle change degree of the target to be tracked in the video sequence, wherein in some embodiments, the angle step length can be set to be q, and the size range of the angle which can be tracked is [ q x r [ ]₁,q*r_m]For example, for the angle set R { -2, -1, 0, 1, 2}, positive and negative indicate two relative directions of angle change (for example, indicating clockwise rotation and counterclockwise rotation, respectively), and if the angle step q is 5 °, the trackable angle size range is [ -10 °, 10 °]For example, -10 ° means that the angle of the target to be tracked is rotated by 10 ° in the counterclockwise direction, 0 means an angle-invariant state, 10 ° means that the angle of the target to be tracked is rotated by 10 ° in the clockwise direction, and in addition, the trackable angle size range can be adjusted by changing the angle step. In addition, specific values of n and m can be determined according to actual conditions (for example, content type of a video sequence, and the like), and the size of the scale set and the angle set is not limited by the disclosure.

In step 102, according to the obtained scale set and one or more scale angle parameter combinations determined by the angle set, features are extracted from the current video frame to obtain one or more feature maps of the current video frame under different scale angle parameter combinations. In one embodiment, the feature is a Histogram of Gradient (HOG) feature of the current video frame, and the HOG feature may be extracted by any method in the related art, which is not limited by the disclosure. For example, the current video frame may be divided into grids of c × c (e.g., 8 × 8), a gradient histogram feature is calculated for each grid, and the gradient histogram features of each grid are connected in series, so as to obtain the gradient histogram feature of the current video frame.

According to the exemplary embodiment of the disclosure, the scale angle of the target to be tracked in the previous video frame of the current video frame can be adjusted according to each scale angle parameter combination in one or more scale angle parameter combinations, so as to obtain the adjusted scale angle of one or more targets to be tracked in the current video frame, and the feature extraction is performed on the current video frame including the target to be tracked with the adjusted scale angle, so as to obtain one or more feature maps. Specifically, a scale parameter and an angle parameter are respectively selected from a scale size range pointed by a scale set and an angle size range pointed by an angle set to serve as a scale angle parameter combination, the scale and the angle of the target to be tracked in the previous video frame of the current video frame are transformed according to the scale angle parameter combination, then the current video frame of the target to be tracked including the transformed scale and angle is subjected to feature extraction, so that a feature map under the scale angle parameter combination is obtained, and therefore a plurality of feature maps of the current video frame can be obtained according to different angle parameter combinations. For example, if the scale set and the angle set respectively include 3 scale parameters and 3 angle parameters, there are 9 combinations of scale and angle parameters, and 9 HOG features related to the current video frame can be calculated, where different combinations of scale and angle parameters can be represented by "i, j", for example, and the HOG features of the current video frame under different combinations of scale and angle parameters can be represented by H, for example_i,jMeaning that "i" denotes each scale parameter s in the aforementioned set of scales_nWith the index "j" denoting each angle parameter r of the preceding angle set_mI.e., 1. ltoreq. i.ltoreq.n, 1. ltoreq. j.ltoreq.m.

In step 103, the scale and the angle of the target to be tracked in the current video frame are obtained according to the obtained one or more feature maps, the feature maps of a preset number of video frames before the current video frame, the scale set and the angle set.

According to the exemplary embodiments of the present disclosure, in order to accurately acquire the scale and the angle of the target to be tracked in the current video frame, the scale information and the angle information of the target to be tracked in a preset number of video frames before the current video frame may be referred to, and in particular, the first response may be acquired according to the feature map (e.g., the gradient histogram HOG), the scale set, and the angle set of the preset number of video frames before the current video frame, and the one or more second responses may be acquired according to the one or more obtained feature maps of the current video frame and the first response, where the second response reflects the closeness of the adjusted scale angle of the target to be tracked in the current video frame to the real scale angle. The adjusted scale angle corresponding to a second response (e.g., the second response with the largest value, etc.) meeting the preset condition in the one or more second responses may be determined as the scale and angle of the target to be tracked in the current video frame.

In particular, the scale gaussian response and the angle gaussian response may be obtained according to the number of scale parameters in the scale set (e.g., the size n of the aforementioned scale set) and the number of angle parameters in the angle set (e.g., the size m of the aforementioned angle set), respectively, where the scale gaussian response, for example, but not limited to, may be represented as G₁＝{g₁,g₂,…,g_nAnd (c) the step of (c) in which,

δ₁is a Gaussian parameter larger than 0, e is a natural constant, i is more than or equal to 1 and less than or equal to n; the angular Gaussian response, for example, but not limited to, may be represented as G₂＝{h₁,h₂,…,h_mAnd (c) the step of (c) in which,

δ₂is a Gaussian parameter larger than 0, j is more than or equal to 1 and less than or equal to m, delta₁And delta₂Can be set according to the actual target tracking scene,the present disclosure is not so limited. In addition, in order to reduce the execution complexity of the target tracking method as much as possible while accurately acquiring the scale and the angle of the target to be tracked in the current video frame, the feature maps of the first two video frames of the current video frame can be acquired. In some embodiments, the feature maps of the first two video frames of the current video frame include a first feature map and a second feature map of a first video frame before the current video frame and a third feature map of a second video frame before the current video frame, where the first feature map and the second feature map are different in that the scale and the angle of the object to be tracked may be different in the process of acquiring the first feature map and the second feature map. Specifically, for clarity of description, it is assumed that a first video frame before a current video frame is a video frame a, and a second video frame before the current video frame is a video frame B, in a process of acquiring a first feature map, a scale and an angle of an object to be tracked in the video frame a are respectively the same as those in the video frame B, in a process of acquiring a second feature map, a scale and an angle of the object to be tracked in the video frame a are an actual scale and an actual angle in the video frame a, and in a process of acquiring a third feature map, a scale and an angle of the object to be tracked in the video frame B are an actual scale and an actual angle in the video frame B.

The first response may be obtained based on the scale gaussian response, the angle gaussian response, and feature maps of the first two video frames of the current video frame. Specifically, to reduce the amount of computation in the execution process of the target tracking method of the present disclosure, a first response may be obtained in the frequency domain, that is, a scale gaussian response, an angle gaussian response, a first feature map, a second feature map, and a third feature map may be subjected to fast fourier transform to obtain the scale gaussian response, the angle gaussian response, the first feature map, the second feature map, and the third feature map of the frequency domain, and then the scale gaussian response of the frequency domain and the angle gaussian response of the frequency domain are associated (for example, the gaussian response of the frequency domain and the angle gaussian response of the frequency domain are multiplied to obtain a scale-angle integrated gaussian response of the frequency domain, and the first response is obtained based on the first feature map, the second feature map, and the third feature map of the frequency domain and the scale-angle integrated gaussian response of the frequency domain. In some embodiments, the second feature map of the frequency domain and the third feature map of the frequency domain may be associated with a scale-angle integrated gaussian response of the frequency domain, respectively (for example, the second feature map of the frequency domain and the third feature map of the frequency domain are multiplied by the scale-angle integrated gaussian response of the frequency domain, respectively, and the like), to obtain a target second feature map of the frequency domain and a target third feature map of the frequency domain, where the target second feature map of the frequency domain and the target third feature map of the frequency domain carry scale-angle information, respectively (i.e., carry scale information related to the aforementioned scale set and angle information related to the aforementioned angle set). The first response may be obtained from the target second profile of the frequency domain, the target third profile of the frequency domain, and the first profile of the frequency domain. In one embodiment, different weights may be first superimposed on the target second feature map of the frequency domain and the target third feature map of the frequency domain to determine the contribution of the target second feature map of the frequency domain and the target third feature map of the frequency domain to the first response, and then the target second feature map of the frequency domain and the target third feature map of the frequency domain, on which different weights are superimposed, are added and then divided by the first feature map of the frequency domain to obtain the first response. That is, the first response, for example, but not limited to, may be expressed as:

wherein, F (G)₁) And F (G)₂) Respectively representing a scale Gaussian response of a frequency domain and an angle Gaussian response of the frequency domain; f (H)_a) A third feature map of a second video frame preceding the current video frame representing the frequency domain; f (H)_c) A second feature map of a first video frame preceding the current video frame representing a frequency domain; f (H)_b) A first feature map of a first video frame preceding the current video frame representing a frequency domain; eta and (1-. eta.) represent different weights, 0<η<1. It should be noted that, if the current video frame is the second video frame of the video sequenceSince there is only one referenceable video frame in the current video frame at this time, since there is only the first video frame before the second video frame, and the scale and angle of the object in the first video frame of the video sequence are known, at this time F (H)_c) And F (H)_b) Similarly, for a second video frame of the video sequence whose current video frame is, the first response can be expressed as:

at this time, F (H)_b) A feature map representing a first video frame of a video sequence in the frequency domain.

In another embodiment, the first response may be obtained directly according to the first feature map of the frequency domain, the target second feature map and the target third feature map without superimposing different weights on the target second feature map and the target third feature map of the frequency domain, which is not limited by the present disclosure.

According to an example embodiment of the present disclosure, one or more feature maps associated with a current video frame may be fast fourier transformed to obtain one or more feature maps in a frequency domain, and one or more second responses may be obtained from each of the one or more feature maps in the frequency domain, the first response (e.g., each feature map may be multiplied by the first response, etc.).

Here, the second response, for example, but not limited to, may be expressed as:

R₂＝F(H_i,j)*R₁ (3)

wherein "i, j" represents the aforementioned different scale angle parameter combinations, F (H)_i,j) A plurality of feature maps representing the frequency domain of the current video frame under different scale angle parameter combinations, i is more than or equal to 1 and less than or equal to n, j is more than or equal to 1 and less than or equal to m, and n is the size of the scale set; m is the size of the angle set; r₁Indicating a first response.

In particular embodiments, the pair of the largest-valued second response of the one or more second responses may be recordedThe corresponding dimension angle parameter combination "i ', j'" is selected from the dimension range [ p × s ] according to the dimension angle combination "i ', j'₁,p*s_n]And angular size range [ q r [ ]₁,q*r_m]Respectively determining the scale adjustment amplitude p s of the target to be tracked_i′And angular adjustment amplitude q r_j′And obtaining the scale and the angle of the target to be tracked in the current video frame according to the determined scale adjustment amplitude and angle adjustment amplitude and the scale and the angle of the target to be tracked in the previous video frame of the current video frame. For example, the current video frame is the second video frame in the video sequence, and the scale of the target to be tracked in the first video frame of the video sequence is (w)₁,h₁) Angle is a₁Then the dimension of the target to be tracked in the second video frame is (w)₁*p*s_i′,h₁*p*s_i′) Angle is a₂＝a₁+q*r_j′。

According to an exemplary embodiment of the present disclosure, after determining, as the scale and the angle of the target in the current video frame, the adjusted scale angle corresponding to the second response meeting the preset condition in the one or more second responses, the first response may be further updated according to the feature map of the current video frame and the feature map of the previous video frame of the current video frame, so as to determine the scale and the size of the target to be tracked in the next video frame. For example, with respect to the foregoing formula (1), the first response R is updated₁Then F (H) can be_c) And F (H)_b) Updating the second feature map and the first feature map of the current video frame, and converting F (H)_a) And updating the feature map of the video frame which is the previous video frame of the current video frame.

The target tracking method can realize the combined tracking of the scale and the angle of the video target, and can be conveniently combined with any displacement tracking method in the related technology to realize the four-dimensional tracking of the target, namely two-dimensional displacement, one-dimensional scale size and one-dimensional angle posture), thereby improving the comprehensiveness and the accuracy of the target tracking.

Based on a combination of one or more of the foregoing embodiments, fig. 2 shows a schematic diagram of a video target tracking process in a specific application scenario.

Referring to fig. 2, the size and the step size of the dimension set S and the angle set R, and the defined dimension size range and angle size range may refer to the foregoing description, which is not repeated herein. Respectively acquiring scale Gaussian response G according to the scale set S and the angle set R₁Sum angle Gaussian response G₂(the specific expressions of both can be referred to the related description above) and responds to the scale Gaussian G₁Sum angle Gaussian response G₂Fast Fourier transform is respectively carried out to obtain a scale Gaussian response F (G) of a frequency domain₁) Angular Gaussian response F (G) with frequency domain₂). The scale and angle of the target to be tracked in the first video frame of the video sequence are known, wherein the scale is (w)₁,h₁) Angle is a₁Extracting HOG characteristic H of the first video frame₁And for HOG feature H₁Performing fast Fourier transform to obtain HOG characteristic F (H) of frequency domain₁). According to F (G)₁)、F(G₂) And F (H)₁) A first response may be calculated

For a second video frame of the video sequence, the gradient histogram feature H of the second video frame under each scale-angle parameter combination "i, j" can be calculated by traversing the scale size range defined by the scale set S and the angle size range defined by the angle set R_i,jAnd for each H_i,jPerforming fast Fourier transform to obtain gradient histogram feature F (H) of the second video frame under each scale angle parameter combination of frequency domain_i,j) For any scale angle parameter combination "i, j", a second response R can be calculated₂＝F(H_i,j)*R₁Recording the combination of angle parameters "i ', j'" when the value of the second response is maximum, and determining the range of the size [ p × s ] from the aforementioned range of the size according to the combination of angle parameters "i ', j'₁,p*s_n]And angular size range [ q r [ ]₁,q*r_m]Respectively determining the scale adjustment amplitude p s of the target to be tracked_i′And angleAdjusting the amplitude q r_j′Then the dimension of the object to be tracked in the second video frame is (w)₁*p*s_i′,h₁*p*s_i′) Angle is a₂＝a₁+p*r_j′. Thereafter, the first response may be updated to

Wherein, F (H)_i′,j′) A feature map obtained by extracting the features of the second video frame according to the actual scale and the actual angle of the target to be tracked in the second video frame in a frequency domain, F (H)₂) For a feature map obtained by extracting features of a second video frame according to the scale and the angle of the target to be tracked in the first video frame in a frequency domain, acquiring a corresponding second response according to the same logic as the operation on the second video frame for subsequent video frames to obtain the scale and the angle of the target to be tracked in the subsequent video frames, and updating a first response R₁And completing the tracking task of the target to be tracked in all video frames of the video sequence.

Referring to fig. 3, fig. 3(a) is a first video frame of a video sequence, and the position and size of the object 301 to be tracked in the first video frame can be represented by a black solid line rectangular box, fig. 3(b) is a second video frame of the video sequence, and due to factors such as motion, the position of the object 301 to be tracked in the second video moves from the horizontal position of the road section to the inclined position, the scale becomes smaller from large, and the angle between the horizontal position and the horizontal plane changes, at which time the object 301 to be tracked is most likely to be lost or the scale and angle of the object 301 to be tracked cannot be tracked, for example, the tracking result obtained by the related art can be represented by a black dotted line rectangular box, which can only obtain the position of the object to be tracked in the second video frame, but cannot obtain the size and angle of the object 301 to be tracked, but can track the scale and angle of the object by the object tracking method shown by the present disclosure (represented by a black solid line), the accuracy of video target tracking is improved.

Referring to fig. 4, a target tracking apparatus 400 according to an exemplary embodiment of the present disclosure may include a set acquisition unit 401, a feature extraction unit 402, and a scale and angle acquisition unit 403.

The set obtaining unit 401 may obtain a scale set and an angle set of a target to be tracked in a video sequence, where the scale set includes one or more scale parameters for measuring a scale change degree of the target to be tracked, and the angle set includes one or more angle parameters for measuring an angle change degree of the target to be tracked. Here, the video sequence includes a plurality of video frames, and the target to be tracked is a moving object in the video sequence, and therefore, the position, the scale, and the angle of the target to be tracked in different video frames may be different.

According to an exemplary embodiment of the present disclosure, the set of scales may be represented as S ═ S₁,s₂,…,s_nWhere n is the size of the scale set, s₁,s₂,…,s_nRespectively representing different scale parameters for measuring the scale change degree of the target to be tracked in the video sequence, in some embodiments, the scale step length can be set to be p, and the trackable scale size range is [ p × s [ ]₁,p*s_n]For example, for a scale set S {0.6, 0.7, 08, 0.9, 1, 1.1, 1.2, 1.3, 1.4}, where 0.6 denotes that the scale of the target to be tracked is reduced to 60% of the original scale, 1 denotes that the scale of the target to be tracked is unchanged, and if the scale step p is 0.5, the trackable scale size range is [0.3, 0.7]Thus, the trackable scale size range can be adjusted by altering the scale step size without having to reset the scale set. The angle set may be represented as R ═ R₁,r₂,…,r_mM is the size of the angle set, r₁,r₂,…,r_mRespectively representing different angle parameters for measuring the angle of the target to be tracked in the video sequenceThe degree of change, in some embodiments, can be set to an angular step size of q, and the range of angular magnitudes that can be tracked is q r₁,q*r_m]For example, when the angle set R is { -2, -1, 0, 1, 2}, positive and negative indicate two relative directions of angle change (for example, clockwise rotation and counterclockwise rotation, respectively), and the angle step q is 5 °, the trackable angle range is [ -10 °, 10 ° ], and the trackable angle range is 10 ° ]]For example, -10 ° means that the angle of the target to be tracked is rotated by 10 ° in the counterclockwise direction, 0 means an angle-invariant state, 10 ° means that the angle of the target to be tracked is rotated by 10 ° in the clockwise direction, and in addition, the trackable angle size range can be adjusted by changing the angle step. In addition, specific values of n and m can be determined according to actual conditions (for example, content type of a video sequence, and the like), and the size of the scale set and the angle set is not limited by the disclosure.

The feature extraction unit 402 may extract features from the current video frame according to the obtained scale set and one or more scale angle parameter combinations determined by the angle set, so as to obtain one or more feature maps under different scale angle parameter combinations. In one embodiment, the feature is a Histogram of Gradient (HOG) feature of the current video frame, and any method in the related art may be used to extract the HOG feature, which is not limited by the disclosure. For example, the current video frame may be divided into grids of c × c (e.g., 8 × 8), a gradient histogram feature is calculated for each grid, and the gradient histogram features of each grid are connected in series, so as to obtain the gradient histogram feature of the current video frame.

According to an exemplary embodiment of the present disclosure, the feature extraction unit 402 may adjust the scale angle of the target to be tracked in the previous video frame of the current video frame according to each of the one or more scale angle parameter combinations, obtain an adjusted scale angle of the one or more targets to be tracked in the current video frame, perform feature extraction on the current video frame including the target to be tracked whose scale angle is adjusted, and obtain one or more feature maps. Specifically, the feature extraction unit 402 may select a scale parameter and an angle parameter from a scale size range pointed by the scale set and an angle size range pointed by the angle set as a scale-angle parameter combination, transform the scale angle of the target to be tracked in the previous video frame of the current video frame according to the scale-angle parameter combination, and extract features of the current video frame of the target to be tracked including the transformed scale and angle, thereby obtaining one or more feature maps of the current video frame under the scale-angle parameter combination. For example, if the scale set and the angle set respectively include 3 scale parameters and 3 angle parameters, there are 9 scale-angle parameter combinations, and 9 HOG features related to the current video frame can be calculated.

The scale and angle obtaining unit 403 may obtain the scale and angle of the target to be tracked in the current video frame according to the obtained one or more feature maps, the feature maps of a preset number of video frames before the current video frame, the scale set, and the angle set.

According to an exemplary embodiment of the present disclosure, the scale and angle obtaining unit 403 may obtain a first response according to a feature map (e.g., a gradient histogram HOG), a scale set, and an angle set of a preset number of video frames before a current video frame, and obtain one or more second responses according to the obtained one or more feature maps of the current video frame and the first response, where the second responses reflect a degree of closeness of an adjusted scale angle of an object to be tracked in the current video frame to a real scale angle, and the scale and angle obtaining unit 403 may determine, as a scale and angle of the object in the current video frame, an adjusted scale angle corresponding to a second response meeting a preset condition in the one or more second responses.

Specifically, the scale and angle acquisition unit 403 may respectively acquire a scale gaussian response and an angle gaussian response according to the number of scale parameters in the scale set (e.g., the size n of the aforementioned scale set) and the number of angle parameters in the angle set (e.g., the size m of the aforementioned angle set), where the scale gaussian is louderShall be, for example, but not limited to, can be represented as G₁＝{g₁,g₂,…,g_nAnd (c) the step of (c) in which,

δ₁is a Gaussian parameter larger than 0, e is a natural constant, i is more than or equal to 1 and less than or equal to n; the angular Gaussian response, for example, but not limited to, may be represented as G₂＝{h₁,h₂,…,h_m-means for, among other things,

δ₂is a Gaussian parameter greater than 0, j is greater than or equal to 1 and less than or equal to m, where delta₁And delta₂The target tracking scene setting may be set according to an actual target tracking scene, which is not limited by this disclosure. In addition, the scale and angle acquisition unit 403 may acquire feature maps of the first two video frames of the current video frame, and in some embodiments, the feature maps of the first two video frames of the current video frame include a first feature map and a second feature map of a first video frame before the current video frame and a third feature map of a second video frame before the current video frame, where the first feature map and the second feature map are different in that the scale and angle of the object to be tracked may be different in the process of acquiring the first feature map and the second feature map. Specifically, for clarity of description, it is assumed that a first video frame before a current video frame is a video frame a, and a second video frame before the current video frame is a video frame B, in a process of acquiring a first feature map, a scale and an angle of an object to be tracked in the video frame a are respectively the same as those in the video frame B, in a process of acquiring a second feature map, a scale and an angle of the object to be tracked in the video frame a are an actual scale and an actual angle in the video frame a, and in a process of acquiring a third feature map, a scale and an angle of the object to be tracked in the video frame B are an actual scale and an actual angle in the video frame B.

The scale and angle acquisition unit 403 may acquire a first response based on the scale gaussian response, the angle gaussian response, and feature maps of the first two video frames of the current video frame. Specifically, the scale and angle obtaining unit 403 may perform fast fourier transform on the scale gaussian response, the angle gaussian response, the first feature map, the second feature map, and the third feature map to obtain the scale gaussian response, the angle gaussian response, the first feature map, the second feature map, and the third feature map of the frequency domain, then may correlate the scale gaussian response of the frequency domain with the angle gaussian response of the frequency domain (for example, multiply the scale gaussian response of the frequency domain with the angle gaussian response of the frequency domain, and the like) to obtain the scale-angle integrated gaussian response of the frequency domain, and obtain the first response based on the first feature map, the second feature map, and the third feature map of the frequency domain and the scale-angle integrated gaussian response of the frequency domain. In some embodiments, the scale and angle obtaining unit 403 may associate the second feature map of the frequency domain and the third feature map of the frequency domain with the scale-angle integrated gaussian response of the frequency domain, respectively (for example, multiply the second feature map of the frequency domain and the third feature map of the frequency domain with the scale-angle integrated gaussian response of the frequency domain, respectively, and so on), and obtain a target second feature map of the frequency domain and a target third feature map of the frequency domain, where the target second feature map of the frequency domain and the target third feature map of the frequency domain carry scale-angle information (that is, carry scale information related to the foregoing scale set and angle information related to the foregoing angle set), respectively. The first response may be obtained from the target second feature map of the frequency domain, the target third feature map of the frequency domain, and the first feature map of the frequency domain. In one embodiment, the scale and angle obtaining unit 403 may first superimpose different weights on the target second feature map of the frequency domain and the target third feature map of the frequency domain, add the target second feature map of the frequency domain and the third feature map of the frequency domain superimposed with different weights, and then divide the added target second feature map and the third feature map of the frequency domain by the first feature map of the frequency domain to obtain the first response.

According to an exemplary embodiment of the present disclosure, the scale and angle obtaining unit 403 may perform fast fourier transform on the one or more feature maps associated with the current video frame to obtain one or more feature maps in a frequency domain, and obtain one or more second responses according to the one or more feature maps in the frequency domain and the first response (for example, multiplying each feature map by the first response).

According to an exemplary embodiment of the present disclosure, the target tracking apparatus 400 may further include a first response updating unit 404 (not shown in fig. 4), and the first response updating unit 404 may update the first response according to the feature map of the current video frame and the feature map of the previous video frame of the current video frame after determining the adjusted scale angle corresponding to the second response meeting the preset condition from among the one or more second responses as the scale and angle of the target in the current video frame, so as to determine the scale and size of the target to be tracked in the next video frame.

Referring to fig. 5, an electronic device 500 includes at least one memory 501 and at least one processor 502, the at least one memory 501 having stored therein a set of computer-executable instructions that, when executed by the at least one processor 502, perform a target tracking method according to an exemplary embodiment of the present disclosure.

By way of example, the electronic device 500 may be a PC computer, tablet device, personal digital assistant, smartphone, or other device capable of executing the set of instructions described above. Here, the electronic device 500 need not be a single electronic device, but can be any collection of devices or circuits that can execute the above instructions (or sets of instructions) individually or in combination. The electronic device 500 may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces with local or remote (e.g., via wireless transmission).

In the electronic device 500, the processor 502 may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a programmable logic device, a special-purpose processor system, a microcontroller, or a microprocessor. By way of example, and not limitation, processors may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like.

The processor 502 may execute instructions or code stored in the memory 501, wherein the memory 501 may also store data. The instructions and data may also be transmitted or received over a network via a network interface device, which may employ any known transmission protocol.

The memory 501 may be integrated with the processor 502, for example, by having RAM or flash memory disposed within an integrated circuit microprocessor or the like. Further, memory 501 may comprise a stand-alone device, such as an external disk drive, storage array, or any other storage device usable by a database system. The memory 501 and the processor 502 may be operatively coupled or may communicate with each other, e.g., through I/O ports, network connections, etc., such that the processor 502 is able to read files stored in the memory.

In addition, the electronic device 500 may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the electronic device 500 may be connected to each other via a bus and/or a network.

According to an exemplary embodiment of the present disclosure, there may also be provided a computer-readable storage medium storing instructions, which when executed by at least one processor, cause the at least one processor to perform a target tracking method according to the present disclosure. Examples of computer-readable storage media herein include: read-only memory (ROM), random-access programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random-access memory (RAM), dynamic random-access memory (DRAM), static random-access memory (SRAM), flash memory, non-volatile memory, CD-ROM, CD-R, CD + R, CD-RW, CD + RW, DVD-ROM, DVD-R, DVD + R, DVD-RW, DVD + RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, Blu-ray or optical disk memory, Hard Disk Drive (HDD), solid-state disk drive (SSD), card-type memory (such as a multimedia card, a Secure Digital (SD) card or an extreme digital (XD) card), tape, a floppy disk, a magneto-optical data storage device, an optical data storage device, a hard disk, a magnetic tape, a magneto-optical data storage device, a hard disk, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic disk, a magnetic data storage device, a magnetic disk, A solid state disk, and any other device configured to store and provide a computer program and any associated data, data files, and data structures to a processor or computer in a non-transitory manner such that the processor or computer can execute the computer program. The computer program in the computer-readable storage medium described above can be run in an environment deployed in a computer apparatus, such as a client, a host, a proxy device, a server, and the like, and further, in one example, the computer program and any associated data, data files, and data structures are distributed across a networked computer system such that the computer program and any associated data, data files, and data structures are stored, accessed, and executed in a distributed fashion by one or more processors or computers.

According to an exemplary embodiment of the present disclosure, a computer program product may also be provided, in which instructions are executable by a processor of a computer device to perform a target tracking method according to an exemplary embodiment of the present disclosure.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A target tracking method, comprising:

acquiring a scale set and an angle set of a target to be tracked in a video sequence, wherein the scale set comprises one or more scale parameters for measuring the scale change degree of the target to be tracked, and the angle set comprises one or more angle parameters for measuring the angle change degree of the target to be tracked;

extracting features of a current video frame according to one or more scale angle parameter combinations determined by the scale set and the angle set to obtain one or more feature maps of the current video frame under different scale angle parameter combinations;

and obtaining the scale and the angle of the target to be tracked in the current video frame according to the one or more feature maps, the feature maps of a preset number of video frames before the current video frame, the scale set and the angle set.

2. The method for tracking the target according to claim 1, wherein the extracting the features of the current video frame according to the combination of one or more scale angle parameters determined by the scale set and the angle set to obtain one or more feature maps of the current video frame under different combinations of the scale angle parameters comprises:

adjusting the scale angle of the target to be tracked in the previous video frame of the current video frame according to each scale angle parameter combination in the one or more scale angle parameter combinations to obtain the adjusted scale angle of the one or more targets to be tracked in the current video frame;

and performing feature extraction on the current video frame comprising the target to be tracked with the adjusted scale angle to obtain the one or more feature maps.

3. The target tracking method according to claim 2, wherein the obtaining the scale and the angle of the target to be tracked in the current video frame according to the one or more feature maps, the feature maps of a preset number of video frames before the current video frame, the scale set and the angle set comprises:

acquiring a first response according to the feature maps, the scale set and the angle set of a preset number of video frames before the current video frame;

acquiring one or more second responses according to the one or more feature maps and the first response, wherein each second response reflects the closeness degree of the adjusted scale angle of the target to be tracked in the current video frame and a real scale angle;

and determining the adjusted scale angle corresponding to the second response meeting the preset condition in the one or more second responses as the scale and the angle of the target to be tracked in the current video frame.

4. The method for tracking an object according to claim 3, wherein said obtaining a first response from the feature map, the set of scales and the set of angles of a preset number of video frames prior to the current video frame comprises:

respectively acquiring a scale Gaussian response and an angle Gaussian response according to the number of the scale parameters in the scale set and the number of the angle parameters in the angle set;

and acquiring the first response based on the scale Gaussian response, the angle Gaussian response and the feature maps of the first two video frames of the current video frame.

5. The object tracking method according to claim 4, wherein the feature maps of the first two video frames of the current video frame include a first feature map and a second feature map of a first video frame preceding the current video frame and a third feature map of a second video frame preceding the current video frame;

the obtaining the first response based on the scale gaussian response, the angle gaussian response and the feature maps of the first two video frames of the current video frame includes:

performing fast Fourier transform on the scale Gaussian response, the angle Gaussian response, the first feature map, the second feature map and the third feature map to obtain the scale Gaussian response, the angle Gaussian response, the first feature map, the second feature map and the third feature map of a frequency domain;

correlating the scale Gaussian response of the frequency domain with the angle Gaussian response of the frequency domain to obtain a scale angle comprehensive Gaussian response of the frequency domain;

and acquiring the first response based on the first characteristic diagram, the second characteristic diagram, the third characteristic diagram and the scale angle comprehensive Gaussian response of the frequency domain.

6. The target tracking method of claim 5, wherein the obtaining the first response based on the first, second, and third feature maps of the frequency domain and a scale angle synthetic Gaussian response comprises:

associating the second characteristic diagram of the frequency domain and the third characteristic diagram of the frequency domain with a scale angle comprehensive Gaussian response of the frequency domain respectively to obtain a target second characteristic diagram of the frequency domain and a target third characteristic diagram of the frequency domain, wherein the target second characteristic diagram of the frequency domain and the target third characteristic diagram of the frequency domain carry scale angle information respectively;

and obtaining the first response according to the target second characteristic diagram of the frequency domain, the target third characteristic diagram of the frequency domain and the first characteristic diagram of the frequency domain.

7. An object tracking device, comprising:

a set acquisition unit configured to: acquiring a scale set and an angle set of a target to be tracked in a video sequence, wherein the scale set comprises one or more scale parameters for measuring the scale change degree of the target to be tracked, and the angle set comprises one or more angle parameters for measuring the angle change degree of the target to be tracked;

a feature extraction unit configured to: extracting features from the current video frame according to the scale set and one or more scale angle parameter combinations determined by the angle set to obtain one or more feature maps of the current video frame under different scale angle parameter combinations;

a scale and angle acquisition unit configured to: and obtaining the scale and the angle of the target to be tracked in the current video frame according to the one or more feature maps, the feature maps of a preset number of video frames before the current video frame, the scale set and the angle set.

8. An electronic device, comprising:

at least one processor;

at least one memory storing computer-executable instructions,

wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform the target tracking method of any one of claims 1 to 6.

9. A computer-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform the target tracking method of any one of claims 1 to 6.

10. A computer program product comprising computer instructions, characterized in that the computer instructions, when executed by at least one processor, implement the object tracking method of any of claims 1 to 6.