CN113888595A - Twin network single-target visual tracking method based on difficult sample mining - Google Patents

Twin network single-target visual tracking method based on difficult sample mining Download PDF

Info

Publication number
CN113888595A
CN113888595A CN202111152770.4A CN202111152770A CN113888595A CN 113888595 A CN113888595 A CN 113888595A CN 202111152770 A CN202111152770 A CN 202111152770A CN 113888595 A CN113888595 A CN 113888595A
Authority
CN
China
Prior art keywords
image
target
sample
difficult
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111152770.4A
Other languages
Chinese (zh)
Other versions
CN113888595B (en
Inventor
黄磊
高占祺
魏志强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ocean University of China
Original Assignee
Ocean University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ocean University of China filed Critical Ocean University of China
Priority to CN202111152770.4A priority Critical patent/CN113888595B/en
Publication of CN113888595A publication Critical patent/CN113888595A/en
Application granted granted Critical
Publication of CN113888595B publication Critical patent/CN113888595B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a twin network single target tracking method based on difficult sample mining, which comprises the following steps of constructing a training set, constructing a convolution twin network based on difficult sample mining and the like: according to the method, the difficult samples are mined and introduced into the target tracking method, the difficult negative samples are mined in the training process to serve as training data, network parameters are updated, the triple loss of the difficult samples is selected to serve as a loss function, the loss function is optimized continuously, the model continuously excavates the difficult negative samples in the training process through the optimized loss, the network is trained fully, similar targets are distinguished better, the model learns the characteristics with distinguishing capacity, and the target tracking effect is better.

Description

Twin network single-target visual tracking method based on difficult sample mining
Technical Field
The invention belongs to the technical field of computer vision, relates to an image processing technology, and particularly relates to a twin network single-target tracking method based on difficult sample mining.
Background
The single-target visual tracking is one of the more popular research subjects in computer vision, and has wide application in the aspects of intelligent video monitoring, robot visual navigation, medical diagnosis, positioning and tracking of underwater organisms and the like, and has a relatively wide development prospect. Visual target tracking refers to specifying a target to be tracked in a first frame of a video sequence and calibrating an initial position of the target to be tracked, and then predicting a position and a size of the target in a subsequent frame to accurately track the target, given a video sequence.
Early classical algorithms all perform processing in a time domain, and the algorithms involve complex calculation, so that the tracking instantaneity is poor due to large operation amount. And then, an algorithm based on the correlation filtering appears, and compared with the algorithm, the introduction of the correlation filtering enables the target tracking method to convert the calculation into a frequency domain, so that the operation amount is greatly reduced, and the speed is greatly improved. With the development of deep learning, researchers introduce deep learning techniques into target tracking, propose a series of methods and achieve good effects.
In recent years, a method of performing target tracking based on a twin network has received unprecedented attention. The existing method adopts a convolutional neural network to perform feature extraction on target modeling. In the target tracking process, the off-line training of the tracked target is one of the keys of the performance of a relational tracking model, and the selection of training data is particularly important during the off-line training of the model. The existing twin network-based method only uses a target area, directly performs related operations on the features extracted from the target area in the features of a test frame image, has poor robustness, cannot process complex scenes such as similar objects and the like, and has insufficient discrimination capability. When the existing method is used for target tracking, the coordinate distance between an object and an example is usually marked as positive when being smaller than a threshold value, otherwise, the coordinate distance is marked as negative, the similarity score of a positive sample pair is maximized through logic loss, and the similarity score of a negative sample pair is minimized.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a twin network single target tracking method based on difficult sample mining, the difficult sample mining is introduced into the target tracking method, difficult negative samples are mined in the training process as training data, network parameters are updated, the difficult sample triple loss is selected as a loss function, the difficult negative samples are continuously optimized, and through optimizing the loss, the model continuously mines the difficult negative samples in the training process, so that the network is fully trained, similar targets are better distinguished, and the model learns the characteristics with distinguishing capability.
In order to solve the technical problems, the invention adopts the technical scheme that:
a twin network single target tracking method based on difficult sample mining comprises the following steps:
step (1), constructing a training set: cutting out a target template image Z and a search area image X of all images in an image sequence training set according to the target position and the size of the images, dividing the search area image X into a positive example image P and a negative example image N, forming a pair of positive sample pairs by the image Z and the image P, forming a pair of negative sample pairs by the image Z and the image N, and forming a training data set by (Z, P, N) triples formed by the target template image Z, the positive example image P and the negative example image N;
step (2), a convolution twin network based on difficult sample mining is constructed, wherein the network comprises three branches, and the three branches share the weight of the feature extraction network; the three branches are respectively used for obtaining a feature map of a target template image, a feature map of a search area positive sample image and a feature map of a negative sample image, wherein in feature extraction, a difficult sample is defined, and difficult sample mining is introduced to learn features with distinguishing capability;
performing cross-correlation operation on the target template image feature map obtained in the step (2) and the search area image feature map to obtain a response map, wherein the position with a higher value in the response map is determined as the most similar position of the image target object, and the response map is expanded to the size of the original image, so that the position of the target on the image to be searched is determined;
step (4), training a twin network based on difficult sample mining based on the training set in the step (1) to obtain a twin network with training convergence;
and (5) performing online target tracking by using the trained twin network.
Further, the operation of step (1) includes cutting out a target area template image and cutting out a search area image; the target template image cutting method comprises the following steps: the target frame of the template image in target tracking is known, a square area is cut out by taking a tracked target as a center, the center position of the target area represents the target position, q pixels are respectively expanded on four sides of the target frame, and finally the size of the cut target image block is zoomed; the cutting method of the search area image comprises the following steps: respectively expanding 2q pixels on four sides of a target frame by taking a target area as a center, and then scaling the size of the image block of the cut search area; where q is (w + h)/4, w is the width of the target frame, and h is the height of the target frame.
Further, the feature extraction networks of different branches of the twin network in the step (2) are all adjusted ResNet-50, and features of the input image are extracted through the ResNet-50.
Further, the positive sample pairs are image pairs with similar visual features and high reference contrast, and the negative sample pairs are image pairs with similar visual features and low reference contrast; the difficult samples in the dataset are defined as:
P={(i,j)|Sv(xi,xj)≥α,Sc(yi,yj)≥β}
N={(m,n)|Sv(xm,xn)≥α,Sc(ym,yn)<β}
wherein S isvRepresenting the degree of similarity of visual features, ScRepresenting the reference contrast similarity, wherein alpha represents a threshold value of the visual feature similarity, and beta represents a threshold value of the reference contrast similarity;
when selecting pictures from a training set for training, selecting a most dissimilar positive sample and a most similar negative sample to form a triple for each picture, and calculating the loss of the difficult sample triple; the difficult sample triplet loss is defined as:
Figure BDA0003287674180000031
wherein M represents M targets selected from each sample, N represents N random images of each target, (z)+Represents max (z,0), z means maxdA,P-mindA,N+ theta, theta is a threshold parameter set according to actual needs, representing the difference boundary of the positive and negative sample similarities, dA,PRepresenting the similarity of the template sample to the positive sample, dA,NRepresenting the distance of the template sample from the negative sample;
through LhardOptimizing loss, continuously mining positive sample pairs and difficult negative samples by the model in the training process, and learning the characteristics with distinguishing capability.
Further, the step (3) is operated as follows: after feature extraction, fusing different-layer features, wherein the lower-layer features have more target position information and the higher-layer features have more semantic information, performing up-sampling operation on the higher-layer features, then fusing the higher-layer features with the lower-layer features, iteratively generating feature maps obtained after fusing different branches and the multilayer features, performing cross-correlation operation on the target template image feature maps and the positive sample image feature maps and the negative sample image feature maps in a search area respectively to obtain response maps, and expanding the response maps to the size of an original image, thereby determining the position of a target on an image to be searched.
Further, the specific operation of step (4) is as follows:
1) training by using initial positive and negative samples, and enabling the Z-direction P to be close to and far away from N through training to obtain a trained classifier;
2) classifying the samples by using a trained classifier, putting the misclassified samples into a negative sample subset as difficult negative samples, and continuing to train the classifier;
3) and repeating the steps until the performance of the classifier is not improved any more.
Further, the online tracking process in step (5) includes the following steps:
1) reading a first frame of picture of a video sequence to be tracked, acquiring the information of a boundary frame of the first frame of picture, cutting out a target template image Z of the first frame according to the method for cutting out the target template image in the step (1), inputting the Z into a template branch of a twin network converged by training in the step (4), extracting and fusing multilayer features of the template image, and then setting t to be 2;
2) reading a tth frame of a video to be tracked, cutting a search area image of the tth frame according to the determined target position in the tth-1 frame and the method for cutting the search area image in the step (1), inputting the cut tth frame search area image into the search branch of the twin network converged by the training in the step (4), and extracting the characteristics of the tth frame search image;
3) performing cross-correlation operation on the characteristic diagram obtained in the step 1) after multi-layer fusion and the characteristic diagram obtained in the step 2);
4) and setting T to be T +1, judging whether T is less than or equal to T, wherein T is the total frame number of the video sequence to be detected, and if T is less than or equal to T, executing the steps 2) -3), otherwise, ending the tracking process of the video sequence to be detected.
Compared with the prior art, the invention has the advantages that:
aiming at the problem that the existing twin network target tracking method does not consider the effect of a difficult sample on a model, the twin network target tracking method based on difficult sample mining is designed, the difficult sample mining is introduced into a target tracking twin network structure, a difficult negative sample is mined as training data in the training process, the difficult sample triple loss is selected as a loss function, the loss function is continuously optimized, the model is made to learn the characteristic with distinguishing capability, and the target tracking effect is better.
Specifically, in the training process, initial positive and negative samples are used for training, then the trained classifier is used for classifying the samples, the samples which are wrongly classified are used as difficult negative samples and are placed into a negative sample subset, then training is continued, and the training is repeated until the performance of the classifier is not improved any more. Different from the traditional triple training that samples are simple and easily distinguishable samples, the method selects the difficult sample triple, updates network parameters by using the difficult sample in the training process, selects the most dissimilar positive sample and the most similar negative sample of each picture to calculate the difficult triple loss, continuously excavates the difficult negative sample in the training process by optimizing the loss through the model, leads the network to be fully trained, better distinguishes similar targets, solves the problems of local change, background interference and the like in the picture, and has stronger generalization capability.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic overall flow diagram of the present invention;
FIG. 2 is a schematic diagram of a difficult sample mining strategy according to the present invention;
FIG. 3 is a graph illustrating the tracking effect of target tracking on a first video sequence using the method of the present invention;
fig. 4 shows the tracking effect of the object tracking of the second video sequence by using the method of the present invention.
Detailed Description
The invention is further described with reference to the following figures and specific embodiments.
With reference to the overall flow of the present invention shown in fig. 1, a twin network single target tracking method based on difficult sample mining includes the following steps:
and (1) constructing a training set.
According to the target position and size of the image, cutting out a target template image Z and a search area image X of all images in the image sequence training set, dividing the search area image X into a positive example image P and a negative example image N, forming a pair of positive sample pairs by the image Z and the image P, forming a pair of negative sample pairs by the image Z and the image N, and forming a training data set by (Z, P, N) triples formed by the target template image Z, the positive example image P and the negative example image N.
Specifically, the operation of step (1) includes cropping a target area template image and cropping a search area image. The target template image cutting method comprises the following steps: the target frame of the template image in target tracking is known, a square area is cut out by taking a tracked target as a center, the center position of the target area represents the target position, q pixels are respectively expanded on four sides of the target frame, and finally the size of the cut target image block is scaled to 127 multiplied by 127. The cutting method of the search area image comprises the following steps: respectively expanding 2q pixels on four sides of a target frame by taking a target area as a center, and then scaling the image block size of the cut search area to 255 multiplied by 255; where q is (w + h)/4, w is the width of the target frame, and h is the height of the target frame.
And (2) constructing a convolution twin network based on difficult sample mining to obtain characteristic diagrams of different branches.
The network comprises three branches and the three branches share the weight of the feature extraction network; the three branches are respectively used for obtaining a feature map of a target template image, a feature map of a search area positive sample image and a feature map of a negative sample image, wherein in feature extraction, a difficult sample is defined, and difficult sample mining is introduced to learn features with distinguishing capability.
Specifically, in the step (2), the feature extraction networks of different branches of the twin network are all adjusted to be ResNet-50, and the input image is subjected to feature extraction through ResNet-50.
Difficult sample mining is introduced to learn features with discriminative power. In conjunction with the difficult sample mining strategy of the present invention shown in fig. 2, in particular, the present invention considers obtaining valid pairs of difficult samples in terms of both visual feature similarity and reference contrast similarity. Image pairs possessing similar visual features and high reference contrast are defined as positive sample pairs, and image pairs possessing similar visual features and low reference contrast are defined as negative sample pairs.
The difficult samples in the dataset are defined as:
P={(i,j)|Sv(xi,xj)≥α,Sc(yi,yj)≥β}
N={(m,n)|Sv(xm,xn)≥α,Sc(ym,yn)<β}
wherein S isvRepresenting the degree of similarity of visual features, ScThe reference contrast similarity is expressed, alpha represents a threshold value of the visual feature similarity, and beta represents a threshold value of the reference contrast similarity.
The traditional triple samples three pictures from training data, which is simple, but most of the samples are simple and easily distinguished sample pairs, and if a large number of training sample pairs are simple sample pairs, the network learning is not facilitated to obtain better features. Therefore, when the method selects the pictures from the training set for training, for each picture, a most dissimilar positive sample and a most similar negative sample are selected to form a triplet, and the triplet loss of the sample difficult to calculate is calculated.
The difficult sample triplet loss is defined as:
Figure BDA0003287674180000061
wherein M represents M targets selected from each sample, N represents N random images of each target, (z)+Represents max (z,0), z means maxdA,P-mindA,N+ theta, theta is a threshold parameter set according to actual needs, representing the difference boundary of the positive and negative sample similarities, dA,PRepresenting the similarity of the template sample to the positive sample, dA,NRepresenting the distance of the template sample from the negative sample.
Through LhardOptimizing loss, continuously mining positive sample pairs and difficult negative samples by the model in the training process, and learning the characteristics with distinguishing capability.
And (3) performing cross-correlation operation on the target template image characteristic diagram obtained in the step (2) and the search area image characteristic diagram to obtain a response diagram, wherein the position with a higher value in the response diagram is regarded as the most similar position of the image target object, so that the position of the target is determined.
Specifically, step (3) operates as follows: after feature extraction, different-layer features are fused, the lower-layer features have more target position information, the higher-layer features have more semantic information, the higher-layer features are subjected to up-sampling operation firstly, then are fused with the lower-layer features, feature maps with different branches and multi-layer features fused are generated in an iterative mode, and the target template image feature maps are subjected to cross-correlation operation with the search area positive sample image feature map and the search area negative sample image feature map respectively to obtain response maps. And expanding the response image to the size of the original image so as to determine the position of the target on the image to be searched.
And (4) training a twin network based on difficult sample mining based on the training set in the step (1) to obtain a twin network with training convergence.
Specifically, the specific operation of step (4) is as follows:
1) training by using initial positive and negative samples, and enabling the Z-direction P to be close to and far away from N through training to obtain a trained classifier;
2) classifying the samples by using a trained classifier, putting the misclassified samples into a negative sample subset as difficult negative samples, and continuing to train the classifier;
3) and repeating the steps until the performance of the classifier is not improved any more.
And (5) performing online target tracking by using the trained twin network.
Specifically, the online tracking process in step (5) includes the steps of:
1) reading a first frame of picture of a video sequence to be tracked, acquiring the information of a boundary frame of the first frame of picture, cutting out a target template image Z of the first frame according to the method for cutting out the target template image in the step (1), inputting the Z into the template branch of the twin network converged by training in the step (4), extracting the multi-layer features of the template image, fusing, and then setting t to 2.
2) Reading the tth frame of the video to be tracked, cutting out the search area image of the tth frame according to the determined target position in the tth-1 frame and the method for cutting out the search area image in the step (1), inputting the cut-out tth frame search area image into the search branch of the convergent twin network of the training in the step (4), and extracting the characteristics of the tth frame search image.
3) Performing cross-correlation operation on the characteristic diagram obtained in the step 1) after multi-layer fusion and the characteristic diagram obtained in the step 2).
4) Setting T to be T +1, and judging whether T is equal to or less than T, wherein T is the total frame number of the video sequence to be detected; and if so, executing the steps 2) -3), otherwise, ending the tracking process of the video sequence to be detected.
Fig. 3 shows the tracking effect of the object tracking of the first video sequence by using the method of the present invention. Therefore, the target tracking method provided by the invention can effectively track the target with similar background interference.
Fig. 4 shows the tracking effect of the object tracking of the second video sequence by using the method of the present invention. Therefore, the target tracking method provided by the invention can effectively track the target with posture change and rapid movement.
In conclusion, the method introduces difficult sample mining into a target tracking twin network structure, designs difficult triple loss, can fully train the network, enhances the distinguishing capability of the classifier, can better distinguish similar targets, solves the problems of local change, background interference and the like in the image, and has stronger generalization capability on the learned model.
It is understood that the above description is not intended to limit the present invention, and the present invention is not limited to the above examples, and those skilled in the art should understand that they can make various changes, modifications, additions and substitutions within the spirit and scope of the present invention.

Claims (7)

1. A twin network single target tracking method based on difficult sample mining is characterized by comprising the following steps:
step (1), constructing a training set: cutting out a target template image Z and a search area image X of all images in an image sequence training set according to the target position and the size of the images, dividing the search area image X into a positive example image P and a negative example image N, forming a pair of positive sample pairs by the image Z and the image P, forming a pair of negative sample pairs by the image Z and the image N, and forming a training data set by (Z, P, N) triples formed by the target template image Z, the positive example image P and the negative example image N;
step (2), a convolution twin network based on difficult sample mining is constructed, wherein the network comprises three branches, and the three branches share the weight of the feature extraction network; the three branches are respectively used for obtaining a feature map of a target template image, a feature map of a search area positive sample image and a feature map of a negative sample image, wherein in feature extraction, a difficult sample is defined, and difficult sample mining is introduced to learn features with distinguishing capability;
performing cross-correlation operation on the target template image feature map obtained in the step (2) and the search area image feature map to obtain a response map, wherein the position with a higher value in the response map is determined as the most similar position of the image target object, and the response map is expanded to the size of the original image, so that the position of the target on the image to be searched is determined;
step (4), training a twin network based on difficult sample mining based on the training set in the step (1) to obtain a twin network with training convergence;
and (5) performing online target tracking by using the trained twin network.
2. The twin network single target tracking method based on difficult sample mining according to claim 1, wherein the operation of step (1) includes cropping a target area template image and cropping a search area image; the target template image cutting method comprises the following steps: the target frame of the template image in target tracking is known, a square area is cut out by taking a tracked target as a center, the center position of the target area represents the target position, q pixels are respectively expanded on four sides of the target frame, and finally the size of the cut target image block is zoomed; the cutting method of the search area image comprises the following steps: respectively expanding 2q pixels on four sides of a target frame by taking a target area as a center, and then scaling the size of the image block of the cut search area; where q is (w + h)/4, w is the width of the target frame, and h is the height of the target frame.
3. The twin network single-target tracking method based on difficult sample mining as claimed in claim 1, wherein the feature extraction networks of different branches of the twin network in step (2) are adjusted ResNet-50, and the input image is feature extracted through ResNet-50.
4. The twin network single target tracking method based on difficult sample mining as claimed in claim 1, wherein the positive sample pairs are image pairs with similar visual features and high reference contrast, and the negative sample pairs are image pairs with similar visual features and low reference contrast; the difficult samples in the dataset are defined as:
P={(i,j)|Sv(xi,xj)≥α,Sc(yi,yj)≥β}
N={(m,n)|Sv(xm,xn)≥α,Sc(ym,yn)<β}
wherein S isvRepresenting the degree of similarity of visual features, ScRepresenting the reference contrast similarity, wherein alpha represents a threshold value of the visual feature similarity, and beta represents a threshold value of the reference contrast similarity;
when selecting pictures from a training set for training, selecting a most dissimilar positive sample and a most similar negative sample to form a triple for each picture, and calculating the loss of the difficult sample triple; the difficult sample triplet loss is defined as:
Figure FDA0003287674170000021
wherein M represents M targets selected from each sample, N represents N random images of each target, (z)+Represents max (z,0), z means maxdA,P-mindA,N+ theta, theta is a threshold parameter set according to actual needs, representing the difference boundary of the positive and negative sample similarities, dA,PRepresenting the similarity of the template sample to the positive sample, dA,NRepresenting the distance of the template sample from the negative sample;
through LhardOptimizing loss, continuously mining positive sample pairs and difficult negative samples by the model in the training process, and learning the characteristics with distinguishing capability.
5. The twin network single target tracking method based on difficult sample mining as claimed in claim 1 or 4, wherein step (3) operates as follows: after feature extraction, fusing different-layer features, wherein the lower-layer features have more target position information and the higher-layer features have more semantic information, performing up-sampling operation on the higher-layer features, then fusing the higher-layer features with the lower-layer features, iteratively generating feature maps obtained after fusing different branches and the multilayer features, performing cross-correlation operation on the target template image feature maps and the positive sample image feature maps and the negative sample image feature maps in a search area respectively to obtain response maps, and expanding the response maps to the size of an original image, thereby determining the position of a target on an image to be searched.
6. The twin network single target tracking method based on difficult sample mining as claimed in claim 1, wherein the specific operation of step (4) is as follows:
1) training by using initial positive and negative samples, and enabling the Z-direction P to be close to and far away from N through training to obtain a trained classifier;
2) classifying the samples by using a trained classifier, putting the misclassified samples into a negative sample subset as difficult negative samples, and continuing to train the classifier;
3) and repeating the steps until the performance of the classifier is not improved any more.
7. The twin network single target tracking method based on difficult sample mining as claimed in claim 2, wherein the on-line tracking process in step (5) comprises the steps of:
1) reading a first frame of picture of a video sequence to be tracked, acquiring the information of a boundary frame of the first frame of picture, cutting out a target template image Z of the first frame according to the method for cutting out the target template image in the step (1), inputting the Z into a template branch of a twin network converged by training in the step (4), extracting and fusing multilayer features of the template image, and then setting t to be 2;
2) reading a tth frame of a video to be tracked, cutting a search area image of the tth frame according to the determined target position in the tth-1 frame and the method for cutting the search area image in the step (1), inputting the cut tth frame search area image into the search branch of the twin network converged by the training in the step (4), and extracting the characteristics of the tth frame search image;
3) performing cross-correlation operation on the characteristic diagram obtained in the step 1) after multi-layer fusion and the characteristic diagram obtained in the step 2);
4) and setting T to be T +1, judging whether T is less than or equal to T, wherein T is the total frame number of the video sequence to be detected, and if T is less than or equal to T, executing the steps 2) -3), otherwise, ending the tracking process of the video sequence to be detected.
CN202111152770.4A 2021-09-29 2021-09-29 Twin network single-target visual tracking method based on difficult sample mining Active CN113888595B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111152770.4A CN113888595B (en) 2021-09-29 2021-09-29 Twin network single-target visual tracking method based on difficult sample mining

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111152770.4A CN113888595B (en) 2021-09-29 2021-09-29 Twin network single-target visual tracking method based on difficult sample mining

Publications (2)

Publication Number Publication Date
CN113888595A true CN113888595A (en) 2022-01-04
CN113888595B CN113888595B (en) 2024-05-14

Family

ID=79008165

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111152770.4A Active CN113888595B (en) 2021-09-29 2021-09-29 Twin network single-target visual tracking method based on difficult sample mining

Country Status (1)

Country Link
CN (1) CN113888595B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111340850A (en) * 2020-03-20 2020-06-26 军事科学院***工程研究院***总体研究所 Ground target tracking method of unmanned aerial vehicle based on twin network and central logic loss
CN111354017A (en) * 2020-03-04 2020-06-30 江南大学 Target tracking method based on twin neural network and parallel attention module
WO2020181685A1 (en) * 2019-03-12 2020-09-17 南京邮电大学 Vehicle-mounted video target detection method based on deep learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020181685A1 (en) * 2019-03-12 2020-09-17 南京邮电大学 Vehicle-mounted video target detection method based on deep learning
CN111354017A (en) * 2020-03-04 2020-06-30 江南大学 Target tracking method based on twin neural network and parallel attention module
CN111340850A (en) * 2020-03-20 2020-06-26 军事科学院***工程研究院***总体研究所 Ground target tracking method of unmanned aerial vehicle based on twin network and central logic loss

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
张博言;钟勇;: "一种基于多样性正实例的单目标跟踪算法", 哈尔滨工业大学学报, no. 10, 25 September 2020 (2020-09-25) *
熊昌镇;李言;: "基于孪生网络的跟踪算法综述", 工业控制计算机, no. 03, 25 March 2020 (2020-03-25) *
纪筱鹏;魏志强;: "基于轮廓特征及扩展Kalman滤波的车辆跟踪方法研究", 中国图象图形学报, no. 02, 16 February 2011 (2011-02-16) *

Also Published As

Publication number Publication date
CN113888595B (en) 2024-05-14

Similar Documents

Publication Publication Date Title
CN112184752A (en) Video target tracking method based on pyramid convolution
CN110674866A (en) Method for detecting X-ray breast lesion images by using transfer learning characteristic pyramid network
EP1934941B1 (en) Bi-directional tracking using trajectory segment analysis
CN109086777B (en) Saliency map refining method based on global pixel characteristics
CN112489081B (en) Visual target tracking method and device
CN112651998B (en) Human body tracking algorithm based on attention mechanism and double-flow multi-domain convolutional neural network
CN111598876B (en) Method, system and equipment for constructing thyroid nodule automatic identification model
CN111368759B (en) Monocular vision-based mobile robot semantic map construction system
CN108595558B (en) Image annotation method based on data equalization strategy and multi-feature fusion
CN112819806A (en) Ship weld defect detection method based on deep convolutional neural network model
CN113592894B (en) Image segmentation method based on boundary box and co-occurrence feature prediction
CN116524197B (en) Point cloud segmentation method, device and equipment combining edge points and depth network
Li et al. Dictionary optimization and constraint neighbor embedding-based dictionary mapping for superdimension reconstruction of porous media
JP2022082493A (en) Pedestrian re-identification method for random shielding recovery based on noise channel
CN110544267B (en) Correlation filtering tracking method for self-adaptive selection characteristics
CN115564801A (en) Attention-based single target tracking method
CN114861761A (en) Loop detection method based on twin network characteristics and geometric verification
CN114495170A (en) Pedestrian re-identification method and system based on local self-attention inhibition
CN117576753A (en) Micro-expression recognition method based on attention feature fusion of facial key points
CN117351078A (en) Target size and 6D gesture estimation method based on shape priori
Gong et al. Research on an improved KCF target tracking algorithm based on CNN feature extraction
CN113888595B (en) Twin network single-target visual tracking method based on difficult sample mining
CN108765384B (en) Significance detection method for joint manifold sequencing and improved convex hull
CN115294371B (en) Complementary feature reliable description and matching method based on deep learning
CN116051601A (en) Depth space-time associated video target tracking method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant