CN113033397A - Target tracking method, device, equipment, medium and program product - Google Patents

Target tracking method, device, equipment, medium and program product Download PDF

Info

Publication number
CN113033397A
CN113033397A CN202110317027.3A CN202110317027A CN113033397A CN 113033397 A CN113033397 A CN 113033397A CN 202110317027 A CN202110317027 A CN 202110317027A CN 113033397 A CN113033397 A CN 113033397A
Authority
CN
China
Prior art keywords
target
area image
search area
training
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110317027.3A
Other languages
Chinese (zh)
Inventor
周双双
黄明飞
梁维斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Open Intelligent Machine Shanghai Co ltd
Original Assignee
Open Intelligent Machine Shanghai Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Open Intelligent Machine Shanghai Co ltd filed Critical Open Intelligent Machine Shanghai Co ltd
Priority to CN202110317027.3A priority Critical patent/CN113033397A/en
Publication of CN113033397A publication Critical patent/CN113033397A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/48Matching video sequences
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application provides a target tracking method, a target tracking device, target tracking equipment, a target tracking medium and a program product. The method comprises the following steps: the method comprises the steps of obtaining a target area image and a search area image, then respectively extracting features of the target area image and the search area image through a preset target tracking model generated based on a twin network tracking algorithm to determine the features of the target area image and the features of the search area image, generating a response graph according to the features of the target area image and the features of the search area image, determining the position of a target to be tracked in the search area image according to the response graph, and improving the robustness of the target tracking method.

Description

Target tracking method, device, equipment, medium and program product
Technical Field
The present embodiments relate to the field of image processing technologies, and in particular, to a target tracking method, apparatus, device, medium, and program product.
Background
With the rapid development of artificial intelligence technology, visual target tracking is an important research direction in computer vision, and has wide application in a plurality of fields.
At present, the visual target tracking technology obtains a satisfactory effect in an ideal environment by using a deep learning target tracking method, so that the target tracking technology obtains a breakthrough direction. Among them, a number of tracking algorithms have been proposed for application in various scenarios, such as: video monitoring scenes, human-computer interaction scenes, unmanned driving scenes and the like.
However, in an actual application scenario, due to the influence of factors such as deformation, occlusion, illumination change, background clutter, fast motion, and the like in a real environment, the robustness of the current target tracking method is difficult to be ensured.
Disclosure of Invention
The target tracking method, the device, the equipment, the medium and the program product provided by the embodiment of the application can improve the robustness of the target tracking method.
In a first aspect, an embodiment of the present application provides a target tracking method, including:
acquiring a target area image and a search area image, wherein the target area image comprises a target to be tracked;
respectively extracting features of the target area image and the search area image through a preset target tracking model to determine the features of the target area image and the features of the search area image, wherein the target tracking model is an algorithm model generated based on a twin network tracking algorithm;
generating a response map according to the target area image features and the search area image features, wherein each response point feature in the response map is used for representing the similarity of each part of the target area image features and the search area image features;
and determining the position of the target to be tracked in the search area image according to the response image.
In one possible design, the performing, by using a preset target tracking model, feature extraction on the target area image and the search area image respectively includes:
performing feature extraction on the target area image by using a target area learning branch in the preset target tracking model;
and extracting the characteristics of the search area image by utilizing a search area learning branch in the preset target tracking model, wherein the weights of the target area learning branch and the search area learning branch are shared.
In one possible design, the performing feature extraction on the target region image by using a target region learning branch in the preset target tracking model includes:
performing feature extraction on the target area image by utilizing a first feature extractor network in the target area learning branch, wherein the first feature extractor network comprises a double-attention mechanism;
correspondingly, the performing feature extraction on the search area image by using the search area learning branch in the preset target tracking model includes:
performing feature extraction on the search area image by using a second feature extractor network in the search area learning branch, wherein the second feature extractor network comprises the double-attention mechanism;
the double attention mechanism is used for improving the weight value of a key feature corresponding to the target to be tracked, and the key feature is used for representing the object characteristic of the target to be tracked.
In one possible design, the determining the position of the target to be tracked in the search area image according to the response map includes:
determining a target characteristic position with the maximum response value according to the response graph;
and mapping the target feature position to the original size of the search area image, wherein the target position corresponding to the target feature position in the search area image is the position of the target to be tracked in the search area image.
In one possible design, the acquiring the target area image and the searching area image includes:
and acquiring a target selection instruction, wherein the target selection instruction is used for determining the target to be tracked from a current frame image, and the search area image is a next frame image of the current frame image.
In one possible design, after the determining the position of the target to be tracked in the search area image according to the response map, the method further includes:
and displaying a tracking identifier in the search area image, wherein the tracking identifier is used for identifying the target to be tracked in the search area image, and the display position of the tracking identifier is determined according to the position of the target to be tracked in the search area image and the size of the target to be tracked.
In one possible design, the target tracking method further includes:
acquiring a training sample set, wherein the training sample set comprises a target area image training set and a search area image training set;
and training the preset target tracking model by using the training target area images in the target area image training set and the training search area images in the search area image training set.
In one possible design, the training the preset target tracking model using a training target area image in the target area image training set and a training search area image in the search area image training set includes:
respectively extracting features of the training target area image and the training search area image according to the preset target tracking model to determine the features of the training target area image and the features of the training search area image, wherein the training target area image comprises a training target to be tracked;
generating a training response graph according to the image features of the training target area and the image features of the training search area;
determining the training position of the training target to be tracked in the search area image according to the training response image;
and performing loss calculation according to the training position and the labeled position in the labeled response diagram, and calculating and updating the gradient in the preset target tracking model according to the calculation result.
In a second aspect, an embodiment of the present application further provides a target tracking apparatus, including:
the system comprises an acquisition module, a tracking module and a tracking module, wherein the acquisition module is used for acquiring a target area image and a search area image, and the target area image comprises a target to be tracked;
the extraction module is used for respectively extracting the features of the target area image and the search area image through a preset target tracking model so as to determine the features of the target area image and the search area image, wherein the target tracking model is an algorithm model generated based on a twin network tracking algorithm;
a generating module, configured to generate a response map according to the target area image feature and the search area image feature, where each response point feature in the response map is used to characterize a similarity between the target area image feature and each part in the search area image feature;
and the determining module is used for determining the position of the target to be tracked in the search area image according to the response image.
In one possible design, the extraction module is specifically configured to:
performing feature extraction on the target area image by using a target area learning branch in the preset target tracking model;
and extracting the characteristics of the search area image by utilizing a search area learning branch in the preset target tracking model, wherein the weights of the target area learning branch and the search area learning branch are shared.
In one possible design, the extraction module is specifically configured to:
performing feature extraction on the target area image by utilizing a first feature extractor network in the target area learning branch, wherein the first feature extractor network comprises a double-attention mechanism;
correspondingly, the performing feature extraction on the search area image by using the search area learning branch in the preset target tracking model includes:
performing feature extraction on the search area image by using a second feature extractor network in the search area learning branch, wherein the second feature extractor network comprises the double-attention mechanism;
the double attention mechanism is used for improving the weight value of a key feature corresponding to the target to be tracked, and the key feature is used for representing the object characteristic of the target to be tracked.
In one possible design, the determining module is specifically configured to:
determining a target characteristic position with the maximum response value according to the response graph;
and mapping the target feature position to the original size of the search area image, wherein the target position corresponding to the target feature position in the search area image is the position of the target to be tracked in the search area image.
In one possible design, the obtaining module is specifically configured to:
and acquiring a target selection instruction, wherein the target selection instruction is used for determining the target to be tracked from a current frame image, and the search area image is a next frame image of the current frame image.
In one possible design, the target tracking apparatus further includes:
and the display module is used for displaying a tracking identifier in the search area image, wherein the tracking identifier is used for identifying the target to be tracked in the search area image, and the display position of the tracking identifier is determined according to the position of the target to be tracked in the search area image and the size of the target to be tracked.
In one possible design, the target tracking apparatus further includes: the training module is specifically configured to:
acquiring a training sample set, wherein the training sample set comprises a target area image training set and a search area image training set;
and training the preset target tracking model by using the training target area images in the target area image training set and the training search area images in the search area image training set.
In one possible design, the training module is specifically configured to:
respectively extracting features of the training target area image and the training search area image according to the preset target tracking model to determine the features of the training target area image and the features of the training search area image, wherein the training target area image comprises a training target to be tracked;
generating a training response graph according to the image features of the training target area and the image features of the training search area;
determining the training position of the training target to be tracked in the search area image according to the training response image;
and performing loss calculation according to the training position and the labeled position in the labeled response diagram, and calculating and updating the gradient in the preset target tracking model according to the calculation result.
In a third aspect, an embodiment of the present application further provides an electronic device, including: the processor is connected with the memory respectively;
the memory for storing a computer program for the processor;
wherein the processor is configured to implement any one of the possible target tracking methods of the first aspect by executing the computer program.
In a fourth aspect, embodiments of the present application further provide a machine-readable storage medium having stored thereon executable instructions that, when executed by a machine, cause the implementation of any one of the possible target tracking methods of the first aspect.
In a fifth aspect, this application further provides a computer program product, including a computer program, where the computer program is executed by a processor to implement any one of the possible target tracking methods in the first aspect.
Therefore, in the above technical solution, the target area image and the search area image are obtained, and then the target area image and the search area image are respectively subjected to feature extraction through a preset target tracking model generated based on a twin network tracking algorithm to determine the target area image feature and the search area image feature, so that a response map is generated according to the target area image feature and the search area image feature, and the position of the target to be tracked in the search area image is determined according to the response map, thereby improving the robustness of the target tracking method.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed to be used in the description of the embodiments or the prior art will be briefly described below. However, it should be understood by those skilled in the art that the drawings in the following description are illustrative of some of the present application only and are not limiting on the scope thereof.
FIG. 1 is a diagram of an application network architecture for a target tracking method shown in the present application according to an exemplary embodiment;
FIG. 2 is a schematic flow diagram of a target tracking method shown herein according to an exemplary embodiment;
FIG. 3 is a flow chart illustrating one implementation of S102 in the embodiment shown in FIG. 2;
FIG. 4 is a flow chart illustrating one implementation of S104 in the embodiment shown in FIG. 2;
FIG. 5 is a comparison of success rates for evaluating tracking methods under one scenario;
FIG. 6 is a comparison chart of success rates for evaluating tracking methods under another scenario;
FIG. 7 is a comparison chart of success rates for evaluating tracking methods in yet another scenario;
FIG. 8 is a schematic flow diagram of a target tracking method shown herein according to another exemplary embodiment;
FIG. 9 is a schematic diagram illustrating a target tracking device according to an exemplary embodiment of the present application;
FIG. 10 is a schematic diagram of a target tracking device shown in the present application according to another exemplary embodiment;
fig. 11 is a schematic structural diagram of an electronic device shown in the present application according to an exemplary embodiment.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It should be understood by those skilled in the art that the embodiments described are a part of the embodiments of the present invention, and not all embodiments. All other embodiments can be obtained by any suitable modification or variation by a person skilled in the art based on the embodiments in the present application.
At present, the visual target tracking technology obtains a satisfactory effect in an ideal environment by using a deep learning target tracking method, so that the target tracking technology obtains a breakthrough direction. Among them, a number of tracking algorithms have been proposed for application in various scenarios, such as: video monitoring scenes, human-computer interaction scenes, unmanned driving scenes and the like. However, in an actual application scenario, due to the influence of factors such as deformation, occlusion, illumination change, background clutter, fast motion, and the like in a real environment, the robustness of the current target tracking method is difficult to be ensured.
In view of this, embodiments of the present application provide a target tracking method, an apparatus, a device, a medium, and a program product, in which a target area image and a search area image are obtained, and then feature extraction is performed on the target area image and the search area image respectively through a preset target tracking model generated based on a twin network tracking algorithm to determine a target area image feature and a search area image feature, so as to generate a response map according to the target area image feature and the search area image feature, and determine a position of a target to be tracked in the search area image according to the response map, thereby improving robustness of the target tracking method. The above technical solution will be described in detail with reference to specific embodiments.
Fig. 1 is a diagram of an application network architecture for a target tracking method shown in the present application according to an exemplary embodiment. As shown in fig. 1, when a specific target is tracked by implementing the target tracking method provided in this embodiment, a target area image 100 and a search area image 200 may be obtained first, where the target area image 100 includes the target to be tracked. For example: in a video monitoring scene, when a supervisor finds a suspicious person in a current frame image of a monitoring video, the suspicious person can be selected as a target to be tracked through a target selection instruction, wherein the suspicious person can be selected by framing on the current frame image, the framed range is the target area image 100, and the suspicious person in the framed range is the target to be tracked.
Then, feature extraction is performed on the target area image 100 and the search area image 200 by the convolution layer in the preset target tracking model, so as to form the determined target area image feature 110 and the search area image feature 210. Then, the target area image features 110 and the search area image features 210 are processed by the double attention mechanism 400, so that the weight value of the key feature corresponding to the target to be tracked is increased by adding the feature extractor network of the double attention mechanism 400, wherein the key feature is used for representing the object characteristic of the target to be tracked, for example, when the target to be tracked is a suspicious person, the key feature is used for representing a person, so that the feature representing the person has more prominence.
A response graph 600 is then generated by means of similarity measures 500 based on the target region image features and the search region image features. Finally, the position of the target to be tracked in the search area image is determined according to the response map 600. For example, the position of the suspicious person in the next frame image of the current frame image is determined according to the response graph 600, so as to achieve the target tracking effect for the suspicious person in video monitoring.
FIG. 2 is a schematic flow diagram illustrating a target tracking method according to an exemplary embodiment of the present application. As shown in fig. 2, the target tracking method provided in this embodiment includes:
s101, acquiring a target area image and a search area image.
Specifically, a target area image and a search area image are obtained, wherein the target area image comprises a target to be tracked. It is worth understanding that the target area image is an image corresponding to the target to be tracked selected in the current frame image, and then the search area image is a next frame image of the current frame image, so as to locate and track the target to be tracked in the search area image according to the target area image.
S102, respectively extracting the features of the target area image and the search area image through a preset target tracking model so as to determine the features of the target area image and the search area image.
In this step, feature extraction may be performed on the target area image and the search area image respectively through a preset target tracking model to determine a target area image feature and a search area image feature, where the target tracking model is an algorithm model generated based on a twin network tracking algorithm. It is worth to be noted that the twin network tracking algorithm is an algorithm for training a similarity learning network in an off-line stage, performing on-line tracking by using an off-line model, wherein a target area learning branch and a search area learning branch of the twin network tracking algorithm are two identical networks, and the model learning is shared by weight to calculate the similarity of two pictures. The twin network uses two networks with the same branch structure and the same parameters. Two different images are input into the twin network, and the output of the network can be regarded as a similarity measure obtained by extracting the same features of the two images in a certain method.
For the target area image and the search area image, a mapping function target function () is learned by the network, the output result of the similarity learning is a scalar score map, and when the result of the similarity learning is higher than the scalar score map, the higher the possibility that the target area image and the search area image are the same target is, the higher the probability is; when (,) gets a lower similarity value, the lower the likelihood that the target region image and the search region image are the same target, wherein the mapping similarity function (,) is trained using target recognition large scale supervised data.
And S103, generating a response graph according to the target area image characteristics and the search area image characteristics.
In this step, a response map may be generated according to the target area image feature and the search area image feature, where each response point feature in the response map is used to characterize the similarity between the target area image feature and each part in the search area image feature.
And S104, determining the position of the target to be tracked in the search area image according to the response image.
In this step, after the two-dimensional response map is obtained by means of similarity measurement, the response map may be mapped to the original image size, where the maximum value position in the response map is the target position of the search area, that is, the position of the target to be tracked in the search area image.
In the step, a target area image and a search area image are obtained, and then feature extraction is performed on the target area image and the search area image respectively through a preset target tracking model generated based on a twin network tracking algorithm to determine the features of the target area image and the features of the search area image, so that a response graph is generated according to the features of the target area image and the features of the search area image, the position of a target to be tracked in the search area image is determined according to the response graph, and the robustness of the target tracking method is improved.
Fig. 3 is a flowchart illustrating an implementation manner of S102 in the embodiment shown in fig. 2. As shown in fig. 3, in the present embodiment, S102 in the above embodiment includes:
and S1021, performing feature extraction on the target area image by using a target area learning branch in a preset target tracking model.
And S1022, performing feature extraction on the target area image by using the first feature extractor network in the target area learning branch.
And S1023, extracting the characteristics of the search area image by using the second characteristic extractor network in the search area learning branch.
Specifically, feature extraction may be performed on the target area image by using a target area learning branch in a preset target tracking model, and then feature extraction may be performed on the search area image by using a search area learning branch in the preset target tracking model, where weights are shared between the target area learning branch and the search area learning branch. The specific steps of feature extraction may be to perform feature extraction on the target area image by using a first feature extractor network in the target area learning branch, where the first feature extractor network includes a dual-attention mechanism, and then perform feature extraction on the search area image by using a second feature extractor network in the search area learning branch, where the second feature extractor network includes the dual-attention mechanism. It is worth to be noted that the double-attention-machine mechanism is used for improving the weight value of the key feature corresponding to the target to be tracked, and the key feature is used for representing the object characteristic of the target to be tracked. The key features are used for representing object characteristics of the target to be tracked, for example, when the target to be tracked is a suspicious person, the key features are used for representing a person, so that the features representing the person are more prominent.
The features of the target area image and the features of the search area image are obtained through a feature extractor network added with a double-attention machine mechanism. The double attention mechanism includes a channel attention mechanism and a space attention mechanism. Specifically, the channel attention mechanism is a re-weight distribution of the original features, and is to calculate a maximum value of each channel, normalize all the maximum values to generate a weight value of each channel, and directly multiply the weight values to the original features to re-arrange the original features. And: the spatial attention mechanism is to obtain the spatial information of the feature map by using maximum pooling and average pooling operations, which embody the salient part of the information, and is complementary to the channel attention map. Maximum pooling and average pooling operations are applied along the channel axis and connected to generate efficient signatures.
In this embodiment, a double-attention mechanism module is added based on a target tracking algorithm of a full convolution twin network, and a target tracking process is regarded as a problem of image similarity measurement. The extraction of the characteristics has robustness, the precision of the tracker can be improved, the overfitting of the network can be effectively avoided, and extra noise can not be introduced; this simple method can be combined with most network structures, improving the performance of the tracker and increasing the accuracy of the tracker.
The method comprises the steps of realizing a double-attention full convolution twin network tracking algorithm, wherein a target area learning branch and a search area learning branch are two same networks, weight sharing is performed between model learning, and a target area learning branch is input into a target area image; the search area learning branch inputs a search area image block; the double attention mechanism enables the network structure learning target characteristics to be more prominent; and calculating the similarity of the branch output characteristics of the target area and the branch output characteristics of the search area, and mapping the maximum value area in the similarity measurement to the original image to obtain the target position of the search area, so that the tracking algorithm becomes more stable and accurate.
Fig. 4 is a flowchart illustrating an implementation manner of S104 in the embodiment shown in fig. 2. As shown in fig. 4, in the present embodiment, S104 in the above embodiment includes:
and S1041, determining the target characteristic position with the maximum response value according to the response graph.
S1042, mapping the target feature position to the original size of the search area image, wherein the target position of the target feature position in the search area image is the position of the target to be tracked in the search area image.
Specifically, a target feature position with the maximum response value is determined according to the response map, and then the target feature position is mapped to the original size of the search area image, so that the target position corresponding to the target feature position in the search area image is the position of the target to be tracked in the search area image.
The method based on the full convolution twin network target tracking algorithm has the advantages that the network CNN has very strong learning capacity, and the feature fitting label information is very strong, so that the importance among features cannot be accurately distinguished. The embodiment of the application adds a double attention mechanism module on the basis of the original full convolution twin network, so that the characteristic mapping is well distinguished, namely adaptively recalibrated, to enhance the characteristics with important significance, inhibit the general characteristics, and improve the generalization capability of the network and the model. Therefore, the data is normalized on the original data, so that the extraction of the characteristics is more robust, the precision of the tracker can be improved, the overfitting of the capacity required by network capture and representation can be effectively avoided, and extra noise can not be introduced. The simple method can be combined with most regularization technologies, the condition that the system is likely to generate overfitting is further relieved, and when a target is subjected to various conditions such as partial occlusion, background confusion and low resolution, the robustness of a tracking algorithm is remarkably improved, so that the tracking algorithm can still effectively track a target object
FIG. 5 is a comparison of success rates for evaluating tracking methods under one scenario; FIG. 6 is a comparison chart of success rates for evaluating tracking methods under another scenario; fig. 7 is a comparison graph of success rates for evaluating tracking methods in yet another scenario. As shown in fig. 5-7, two evaluation criteria can be used to evaluate the performance of the improved tracking method in the embodiment of the present application and the existing tracking method. In the success rate map, the abscissa indicates the overlap threshold, the ordinate indicates the success rate, and the overlap rate is obtained by calculating the overlap rate of the tracking-result target frame and the true-result target frame.
Through the success rate evaluation mode, select OTB2015 public standard data set to go up the experiment, these data sets contain different challenge factors, include: illumination changes, size changes, occlusion, distortion, fast motion, background clutter, in-plane rotation, out-of-range, out-of-plane rotation, background clutter, and low resolution. Meanwhile, the improved tracking method designed by the application is compared with the existing tracking method.
As shown in fig. 5, the improved tracking method designed for the present application is compared to existing tracking methods. As can be seen from fig. 5, the improved tracking method, in which the tracker performs the most robust performance on 100 sets of test videos, the OPE forming power value is 0.610, and the forming power value of the existing tracking method is 0.582, it can be seen that the improved tracking method improves the success rate by 2.8 percentage points, and therefore, the improved tracking method provided in the embodiment of the present application performs the more robust tracking effect.
As shown in fig. 6, under the low resolution challenge, it can be seen that the success rate of the improved tracking method reaches 0.661, while the success rate of the existing tracking method is 0.618, which is equivalent to an improvement of 4.3 percentage points.
As shown in fig. 7, under the challenge of background confusion, it is seen that the improved tracking method provided by the embodiment of the present application reaches a power value of 0.577, whereas the success rate of the existing tracking method is only 0.523, which is equivalent to an improvement of 5.4 percentage points.
Therefore, the above experiments also prove that the improved tracking method provided by the application adaptively recalibrates the feature mapping to enhance the significant features, inhibit the general features, and improve the generalization capability of the network and the model; the double attention mechanism is mainly embodied in the operation of the channel and the space on the feature, and is helpful for increasing the robustness of the feature so as to further improve the performance of the tracker.
On the basis of the above embodiment, if a target selection instruction is obtained in a specific application scene, the target selection instruction is used to determine a target to be tracked from a current frame image, and the search area image is a next frame image of the current frame image. For example, in a video monitoring scene, when a supervisor finds a suspicious person in a current frame image of a monitoring video, the suspicious person may be selected as a target to be tracked through a target selection instruction, wherein the suspicious person may be selected by framing on the current frame image, a framed range is the target area image, and the suspicious person in the framed range is the target to be tracked.
In addition, after the position of the target to be tracked in the search area image is determined according to the response image, a tracking identifier is displayed in the search area image and used for identifying the target to be tracked in the search area image, wherein the display position of the tracking identifier is determined according to the position of the target to be tracked in the search area image and the size of the target to be tracked.
FIG. 8 is a schematic flow diagram of a target tracking method shown in the present application according to another exemplary embodiment. As shown in fig. 8, the target tracking method provided in this embodiment includes:
s201, obtaining a training sample set.
In this step, a training sample set is obtained, which includes a target area image training set and a search area image training set.
Then, a preset target tracking model is trained by using the training target area images in the target area image training set and the training search area images in the search area image training set.
S202, respectively extracting the features of the training target area image and the training search area image according to a preset target tracking model.
Specifically, feature extraction is respectively carried out on a training target area image and a training search area image according to a preset target tracking model so as to determine the feature of the training target area image and the feature of the training search area image, wherein the training target area image comprises a training target to be tracked.
And S203, generating a training response graph according to the image characteristics of the training target area and the image characteristics of the training search area.
And S204, determining the training position of the target to be tracked in the search area image according to the training response image.
S205, loss calculation is carried out according to the training position and the labeling position in the labeling response diagram, and the gradient in the preset target tracking model is calculated and updated according to the calculation result.
Specifically, the pixel range of the target positive and negative samples is determined by taking the marking position in the marking response graph as the center of a circle and using a preset radius, and then the gradient is calculated and updated through cross entropy loss after the pixel range of the positive and negative samples is determined. If the training position in the training response diagram is not within the pixel range, the training effect is crossed, and at this time, the loss is large, and the gradient in the preset target tracking model needs to be calculated and updated according to the calculation result.
FIG. 9 is a schematic diagram illustrating a target tracking device according to an exemplary embodiment of the present application. As shown in fig. 9, the object tracking apparatus 300 according to the present embodiment includes:
an obtaining module 301, configured to obtain a target area image and a search area image, where the target area image includes a target to be tracked;
an extraction module 302, configured to perform feature extraction on the target region image and the search region image through a preset target tracking model, respectively, to determine a target region image feature and a search region image feature, where the target tracking model is an algorithm model generated based on a twin network tracking algorithm;
a generating module 303, configured to generate a response map according to the target area image feature and the search area image feature, where each response point feature in the response map is used to characterize a similarity between the target area image feature and each part in the search area image feature;
a determining module 304, configured to determine, according to the response map, a position of the target to be tracked in the search area image.
In one possible design, the extracting module 302 is specifically configured to:
performing feature extraction on the target area image by using a target area learning branch in the preset target tracking model;
and extracting the characteristics of the search area image by utilizing a search area learning branch in the preset target tracking model, wherein the weights of the target area learning branch and the search area learning branch are shared.
In one possible design, the extracting module 302 is specifically configured to:
performing feature extraction on the target area image by utilizing a first feature extractor network in the target area learning branch, wherein the first feature extractor network comprises a double-attention mechanism;
correspondingly, the performing feature extraction on the search area image by using the search area learning branch in the preset target tracking model includes:
performing feature extraction on the search area image by using a second feature extractor network in the search area learning branch, wherein the second feature extractor network comprises the double-attention mechanism;
the double attention mechanism is used for improving the weight value of a key feature corresponding to the target to be tracked, and the key feature is used for representing the object characteristic of the target to be tracked.
In one possible design, the determining module 304 is specifically configured to:
determining a target characteristic position with the maximum response value according to the response graph;
and mapping the target feature position to the original size of the search area image, wherein the target position corresponding to the target feature position in the search area image is the position of the target to be tracked in the search area image.
In a possible design, the obtaining module 301 is specifically configured to:
and acquiring a target selection instruction, wherein the target selection instruction is used for determining the target to be tracked from a current frame image, and the search area image is a next frame image of the current frame image.
On the basis of the embodiment shown in fig. 9, fig. 10 is a schematic structural diagram of a target tracking device shown in the present application according to another exemplary embodiment. As shown in fig. 10, the target tracking apparatus 300 according to the present embodiment further includes:
a display module 305, configured to display a tracking identifier in the search area image, where the tracking identifier is used to identify the target to be tracked in the search area image, and a display position of the tracking identifier is determined according to a position of the target to be tracked in the search area image and a size of the target to be tracked.
In one possible design, the target tracking apparatus further includes: the training module 306 is specifically configured to:
acquiring a training sample set, wherein the training sample set comprises a target area image training set and a search area image training set;
and training the preset target tracking model by using the training target area images in the target area image training set and the training search area images in the search area image training set.
In one possible design, the training module 306 is specifically configured to:
respectively extracting features of the training target area image and the training search area image according to the preset target tracking model to determine the features of the training target area image and the features of the training search area image, wherein the training target area image comprises a training target to be tracked;
generating a training response graph according to the image features of the training target area and the image features of the training search area;
determining the training position of the training target to be tracked in the search area image according to the training response image;
and performing loss calculation according to the training position and the labeled position in the labeled response diagram, and calculating and updating the gradient in the preset target tracking model according to the calculation result.
In the embodiment of the present application, the division of the module is only one logic function division, and there may be another division manner in actual implementation. For example, multiple modules or components may be combined or may be integrated into another system. In addition, the coupling between the respective modules may be a direct coupling or an indirect coupling. In addition, the functional modules in the embodiments of the present application may be integrated into one processing module, or may exist separately and physically.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a machine-readable storage medium. Therefore, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a machine-readable storage medium and may include several instructions to cause an electronic device to perform all or part of the processes of the technical solution described in the embodiments of the present application. The storage medium may include various media that can store program codes, such as ROM, RAM, a removable disk, a hard disk, a magnetic disk, or an optical disk.
Fig. 11 is a schematic structural diagram of an electronic device shown in the present application according to an exemplary embodiment. As shown in fig. 11, the electronic device 400 provided in this embodiment includes:
a processor 401 and a memory 402, wherein the processor 401 is connected to the memory 403;
the memory 402 for storing the computer program of the processor 401;
wherein the processor 401 is configured to implement the steps of any of the above-described method embodiments by executing the computer program.
Alternatively, the memory 402 may be separate or integrated with the processor 401.
When the memory 402 is a device independent from the processor 401, the electronic device 400 may further include:
a bus 403 for connecting the processor 401 and the memory 402.
In addition, the embodiment of the application also provides a machine-readable storage medium. The machine-readable storage medium may store executable instructions that, when executed by a machine, cause the machine to perform the specific processes in the above method embodiments.
The machine-readable storage medium described above in this application may be a computer-readable signal medium or a computer-readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
Embodiments of the present application also provide a program product, which includes a computer program, and the computer program is stored in a readable storage medium. The computer program may be read from a readable storage medium by at least one processor of the electronic device, and execution of the computer program by the at least one processor causes the electronic device to perform the steps of the above-described method.
Furthermore, those of skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The above description is only for the specific embodiments of the present application, and the scope of the present application is not limited thereto. Those skilled in the art can make changes or substitutions within the technical scope disclosed in the present application, and such changes or substitutions should be within the protective scope of the present application.

Claims (12)

1. A target tracking method, comprising:
acquiring a target area image and a search area image, wherein the target area image comprises a target to be tracked;
respectively extracting features of the target area image and the search area image through a preset target tracking model to determine the features of the target area image and the features of the search area image, wherein the target tracking model is an algorithm model generated based on a twin network tracking algorithm;
generating a response map according to the target area image features and the search area image features, wherein each response point feature in the response map is used for representing the similarity of each part of the target area image features and the search area image features;
and determining the position of the target to be tracked in the search area image according to the response image.
2. The target tracking method according to claim 1, wherein the performing feature extraction on the target area image and the search area image respectively through a preset target tracking model comprises:
performing feature extraction on the target area image by using a target area learning branch in the preset target tracking model;
and extracting the characteristics of the search area image by utilizing a search area learning branch in the preset target tracking model, wherein the weights of the target area learning branch and the search area learning branch are shared.
3. The target tracking method according to claim 2, wherein the performing feature extraction on the target area image by using a target area learning branch in the preset target tracking model comprises:
performing feature extraction on the target area image by utilizing a first feature extractor network in the target area learning branch, wherein the first feature extractor network comprises a double-attention mechanism;
correspondingly, the performing feature extraction on the search area image by using the search area learning branch in the preset target tracking model includes:
performing feature extraction on the search area image by using a second feature extractor network in the search area learning branch, wherein the second feature extractor network comprises the double-attention mechanism;
the double attention mechanism is used for improving the weight value of a key feature corresponding to the target to be tracked, and the key feature is used for representing the object characteristic of the target to be tracked.
4. The target tracking method according to any one of claims 1 to 3, wherein the determining the position of the target to be tracked in the search area image according to the response map comprises:
determining a target characteristic position with the maximum response value according to the response graph;
and mapping the target feature position to the original size of the search area image, wherein the target position corresponding to the target feature position in the search area image is the position of the target to be tracked in the search area image.
5. The target tracking method according to any one of claims 1 to 3, wherein the acquiring the target area image and the searching the area image includes:
and acquiring a target selection instruction, wherein the target selection instruction is used for determining the target to be tracked from a current frame image, and the search area image is a next frame image of the current frame image.
6. The target tracking method according to claim 5, further comprising, after the determining the position of the target to be tracked in the search area image according to the response map:
and displaying a tracking identifier in the search area image, wherein the tracking identifier is used for identifying the target to be tracked in the search area image, and the display position of the tracking identifier is determined according to the position of the target to be tracked in the search area image and the size of the target to be tracked.
7. The target tracking method according to any one of claims 1 to 3, further comprising:
acquiring a training sample set, wherein the training sample set comprises a target area image training set and a search area image training set;
and training the preset target tracking model by using the training target area images in the target area image training set and the training search area images in the search area image training set.
8. The method according to claim 7, wherein the training the preset target tracking model using the training target area images in the target area image training set and the training search area images in the search area image training set comprises:
respectively extracting features of the training target area image and the training search area image according to the preset target tracking model to determine the features of the training target area image and the features of the training search area image, wherein the training target area image comprises a training target to be tracked;
generating a training response graph according to the image features of the training target area and the image features of the training search area;
determining the training position of the training target to be tracked in the search area image according to the training response image;
and performing loss calculation according to the training position and the labeled position in the labeled response diagram, and calculating and updating the gradient in the preset target tracking model according to the calculation result.
9. An object tracking device, comprising:
the system comprises an acquisition module, a tracking module and a tracking module, wherein the acquisition module is used for acquiring a target area image and a search area image, and the target area image comprises a target to be tracked;
the extraction module is used for respectively extracting the features of the target area image and the search area image through a preset target tracking model so as to determine the features of the target area image and the search area image, wherein the target tracking model is an algorithm model generated based on a twin network tracking algorithm;
a generating module, configured to generate a response map according to the target area image feature and the search area image feature, where each response point feature in the response map is used to characterize a similarity between the target area image feature and each part in the search area image feature;
and the determining module is used for determining the position of the target to be tracked in the search area image according to the response image.
10. An electronic device, comprising: the processor is connected with the memory respectively;
the memory for storing a computer program for the processor;
wherein the processor is configured to implement the target tracking method of any one of claims 1 to 8 by executing the computer program.
11. A machine-readable storage medium having stored thereon executable instructions which when executed by a machine result in the implementation of the object tracking method according to any one of claims 1 to 8.
12. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, carries out the object tracking method of any one of claims 1 to 8.
CN202110317027.3A 2021-03-25 2021-03-25 Target tracking method, device, equipment, medium and program product Pending CN113033397A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110317027.3A CN113033397A (en) 2021-03-25 2021-03-25 Target tracking method, device, equipment, medium and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110317027.3A CN113033397A (en) 2021-03-25 2021-03-25 Target tracking method, device, equipment, medium and program product

Publications (1)

Publication Number Publication Date
CN113033397A true CN113033397A (en) 2021-06-25

Family

ID=76473485

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110317027.3A Pending CN113033397A (en) 2021-03-25 2021-03-25 Target tracking method, device, equipment, medium and program product

Country Status (1)

Country Link
CN (1) CN113033397A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114898201A (en) * 2022-07-11 2022-08-12 浙江大华技术股份有限公司 Target detection method, device, equipment and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070218439A1 (en) * 2005-12-15 2007-09-20 Posit Science Corporation Cognitive training using visual searches
CN110851644A (en) * 2019-11-04 2020-02-28 泰康保险集团股份有限公司 Image retrieval method and device, computer-readable storage medium and electronic device
CN111260688A (en) * 2020-01-13 2020-06-09 深圳大学 Twin double-path target tracking method
CN111340850A (en) * 2020-03-20 2020-06-26 军事科学院***工程研究院***总体研究所 Ground target tracking method of unmanned aerial vehicle based on twin network and central logic loss

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070218439A1 (en) * 2005-12-15 2007-09-20 Posit Science Corporation Cognitive training using visual searches
CN110851644A (en) * 2019-11-04 2020-02-28 泰康保险集团股份有限公司 Image retrieval method and device, computer-readable storage medium and electronic device
CN111260688A (en) * 2020-01-13 2020-06-09 深圳大学 Twin double-path target tracking method
CN111340850A (en) * 2020-03-20 2020-06-26 军事科学院***工程研究院***总体研究所 Ground target tracking method of unmanned aerial vehicle based on twin network and central logic loss

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Triplet Loss in Siamese Network for Object Tracking", ECCV 2018 *
杨康等: "基于双重注意力孪生网络的实时视觉跟踪", 计算机应用, pages 1652 - 1656 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114898201A (en) * 2022-07-11 2022-08-12 浙江大华技术股份有限公司 Target detection method, device, equipment and medium
CN114898201B (en) * 2022-07-11 2022-10-28 浙江大华技术股份有限公司 Target detection method, device, equipment and medium

Similar Documents

Publication Publication Date Title
CN111860573B (en) Model training method, image category detection method and device and electronic equipment
CN112560876B (en) Single-stage small sample target detection method for decoupling measurement
CN108256479B (en) Face tracking method and device
CN109086811B (en) Multi-label image classification method and device and electronic equipment
EP3637310A1 (en) Method and apparatus for generating vehicle damage information
CN110910422A (en) Target tracking method and device, electronic equipment and readable storage medium
CN112861575A (en) Pedestrian structuring method, device, equipment and storage medium
CN110555405B (en) Target tracking method and device, storage medium and electronic equipment
CN112801047B (en) Defect detection method and device, electronic equipment and readable storage medium
CN113920538B (en) Object detection method, device, equipment, storage medium and computer program product
CN111210446A (en) Video target segmentation method, device and equipment
CN116503399B (en) Insulator pollution flashover detection method based on YOLO-AFPS
CN113989696A (en) Target tracking method and device, electronic equipment and storage medium
CN114898416A (en) Face recognition method and device, electronic equipment and readable storage medium
EP3376438A1 (en) A system and method for detecting change using ontology based saliency
Niu et al. Boundary-aware RGBD salient object detection with cross-modal feature sampling
CN113033397A (en) Target tracking method, device, equipment, medium and program product
CN116797973A (en) Data mining method and system applied to sanitation intelligent management platform
CN111310595A (en) Method and apparatus for generating information
CN110852261A (en) Target detection method and device, electronic equipment and readable storage medium
CN110728316A (en) Classroom behavior detection method, system, device and storage medium
CN113869163B (en) Target tracking method and device, electronic equipment and storage medium
CN112801960B (en) Image processing method and device, storage medium and electronic equipment
CN115393755A (en) Visual target tracking method, device, equipment and storage medium
CN114973410A (en) Method and device for extracting motion characteristics of video frame

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination