CN113888595A - Twin network single-target visual tracking method based on difficult sample mining - Google Patents
Twin network single-target visual tracking method based on difficult sample mining Download PDFInfo
- Publication number
- CN113888595A CN113888595A CN202111152770.4A CN202111152770A CN113888595A CN 113888595 A CN113888595 A CN 113888595A CN 202111152770 A CN202111152770 A CN 202111152770A CN 113888595 A CN113888595 A CN 113888595A
- Authority
- CN
- China
- Prior art keywords
- image
- target
- sample
- difficult
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 66
- 238000005065 mining Methods 0.000 title claims abstract description 34
- 230000000007 visual effect Effects 0.000 title claims description 18
- 238000012549 training Methods 0.000 claims abstract description 60
- 230000008569 process Effects 0.000 claims abstract description 19
- 238000000605 extraction Methods 0.000 claims description 14
- 230000004044 response Effects 0.000 claims description 14
- 238000010586 diagram Methods 0.000 claims description 13
- 101100208381 Caenorhabditis elegans tth-1 gene Proteins 0.000 claims description 3
- 230000004927 fusion Effects 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 8
- 230000008859 change Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000007792 addition Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a twin network single target tracking method based on difficult sample mining, which comprises the following steps of constructing a training set, constructing a convolution twin network based on difficult sample mining and the like: according to the method, the difficult samples are mined and introduced into the target tracking method, the difficult negative samples are mined in the training process to serve as training data, network parameters are updated, the triple loss of the difficult samples is selected to serve as a loss function, the loss function is optimized continuously, the model continuously excavates the difficult negative samples in the training process through the optimized loss, the network is trained fully, similar targets are distinguished better, the model learns the characteristics with distinguishing capacity, and the target tracking effect is better.
Description
Technical Field
The invention belongs to the technical field of computer vision, relates to an image processing technology, and particularly relates to a twin network single-target tracking method based on difficult sample mining.
Background
The single-target visual tracking is one of the more popular research subjects in computer vision, and has wide application in the aspects of intelligent video monitoring, robot visual navigation, medical diagnosis, positioning and tracking of underwater organisms and the like, and has a relatively wide development prospect. Visual target tracking refers to specifying a target to be tracked in a first frame of a video sequence and calibrating an initial position of the target to be tracked, and then predicting a position and a size of the target in a subsequent frame to accurately track the target, given a video sequence.
Early classical algorithms all perform processing in a time domain, and the algorithms involve complex calculation, so that the tracking instantaneity is poor due to large operation amount. And then, an algorithm based on the correlation filtering appears, and compared with the algorithm, the introduction of the correlation filtering enables the target tracking method to convert the calculation into a frequency domain, so that the operation amount is greatly reduced, and the speed is greatly improved. With the development of deep learning, researchers introduce deep learning techniques into target tracking, propose a series of methods and achieve good effects.
In recent years, a method of performing target tracking based on a twin network has received unprecedented attention. The existing method adopts a convolutional neural network to perform feature extraction on target modeling. In the target tracking process, the off-line training of the tracked target is one of the keys of the performance of a relational tracking model, and the selection of training data is particularly important during the off-line training of the model. The existing twin network-based method only uses a target area, directly performs related operations on the features extracted from the target area in the features of a test frame image, has poor robustness, cannot process complex scenes such as similar objects and the like, and has insufficient discrimination capability. When the existing method is used for target tracking, the coordinate distance between an object and an example is usually marked as positive when being smaller than a threshold value, otherwise, the coordinate distance is marked as negative, the similarity score of a positive sample pair is maximized through logic loss, and the similarity score of a negative sample pair is minimized.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a twin network single target tracking method based on difficult sample mining, the difficult sample mining is introduced into the target tracking method, difficult negative samples are mined in the training process as training data, network parameters are updated, the difficult sample triple loss is selected as a loss function, the difficult negative samples are continuously optimized, and through optimizing the loss, the model continuously mines the difficult negative samples in the training process, so that the network is fully trained, similar targets are better distinguished, and the model learns the characteristics with distinguishing capability.
In order to solve the technical problems, the invention adopts the technical scheme that:
a twin network single target tracking method based on difficult sample mining comprises the following steps:
step (1), constructing a training set: cutting out a target template image Z and a search area image X of all images in an image sequence training set according to the target position and the size of the images, dividing the search area image X into a positive example image P and a negative example image N, forming a pair of positive sample pairs by the image Z and the image P, forming a pair of negative sample pairs by the image Z and the image N, and forming a training data set by (Z, P, N) triples formed by the target template image Z, the positive example image P and the negative example image N;
step (2), a convolution twin network based on difficult sample mining is constructed, wherein the network comprises three branches, and the three branches share the weight of the feature extraction network; the three branches are respectively used for obtaining a feature map of a target template image, a feature map of a search area positive sample image and a feature map of a negative sample image, wherein in feature extraction, a difficult sample is defined, and difficult sample mining is introduced to learn features with distinguishing capability;
performing cross-correlation operation on the target template image feature map obtained in the step (2) and the search area image feature map to obtain a response map, wherein the position with a higher value in the response map is determined as the most similar position of the image target object, and the response map is expanded to the size of the original image, so that the position of the target on the image to be searched is determined;
step (4), training a twin network based on difficult sample mining based on the training set in the step (1) to obtain a twin network with training convergence;
and (5) performing online target tracking by using the trained twin network.
Further, the operation of step (1) includes cutting out a target area template image and cutting out a search area image; the target template image cutting method comprises the following steps: the target frame of the template image in target tracking is known, a square area is cut out by taking a tracked target as a center, the center position of the target area represents the target position, q pixels are respectively expanded on four sides of the target frame, and finally the size of the cut target image block is zoomed; the cutting method of the search area image comprises the following steps: respectively expanding 2q pixels on four sides of a target frame by taking a target area as a center, and then scaling the size of the image block of the cut search area; where q is (w + h)/4, w is the width of the target frame, and h is the height of the target frame.
Further, the feature extraction networks of different branches of the twin network in the step (2) are all adjusted ResNet-50, and features of the input image are extracted through the ResNet-50.
Further, the positive sample pairs are image pairs with similar visual features and high reference contrast, and the negative sample pairs are image pairs with similar visual features and low reference contrast; the difficult samples in the dataset are defined as:
P={(i,j)|Sv(xi,xj)≥α,Sc(yi,yj)≥β}
N={(m,n)|Sv(xm,xn)≥α,Sc(ym,yn)<β}
wherein S isvRepresenting the degree of similarity of visual features, ScRepresenting the reference contrast similarity, wherein alpha represents a threshold value of the visual feature similarity, and beta represents a threshold value of the reference contrast similarity;
when selecting pictures from a training set for training, selecting a most dissimilar positive sample and a most similar negative sample to form a triple for each picture, and calculating the loss of the difficult sample triple; the difficult sample triplet loss is defined as:
wherein M represents M targets selected from each sample, N represents N random images of each target, (z)+Represents max (z,0), z means maxdA,P-mindA,N+ theta, theta is a threshold parameter set according to actual needs, representing the difference boundary of the positive and negative sample similarities, dA,PRepresenting the similarity of the template sample to the positive sample, dA,NRepresenting the distance of the template sample from the negative sample;
through LhardOptimizing loss, continuously mining positive sample pairs and difficult negative samples by the model in the training process, and learning the characteristics with distinguishing capability.
Further, the step (3) is operated as follows: after feature extraction, fusing different-layer features, wherein the lower-layer features have more target position information and the higher-layer features have more semantic information, performing up-sampling operation on the higher-layer features, then fusing the higher-layer features with the lower-layer features, iteratively generating feature maps obtained after fusing different branches and the multilayer features, performing cross-correlation operation on the target template image feature maps and the positive sample image feature maps and the negative sample image feature maps in a search area respectively to obtain response maps, and expanding the response maps to the size of an original image, thereby determining the position of a target on an image to be searched.
Further, the specific operation of step (4) is as follows:
1) training by using initial positive and negative samples, and enabling the Z-direction P to be close to and far away from N through training to obtain a trained classifier;
2) classifying the samples by using a trained classifier, putting the misclassified samples into a negative sample subset as difficult negative samples, and continuing to train the classifier;
3) and repeating the steps until the performance of the classifier is not improved any more.
Further, the online tracking process in step (5) includes the following steps:
1) reading a first frame of picture of a video sequence to be tracked, acquiring the information of a boundary frame of the first frame of picture, cutting out a target template image Z of the first frame according to the method for cutting out the target template image in the step (1), inputting the Z into a template branch of a twin network converged by training in the step (4), extracting and fusing multilayer features of the template image, and then setting t to be 2;
2) reading a tth frame of a video to be tracked, cutting a search area image of the tth frame according to the determined target position in the tth-1 frame and the method for cutting the search area image in the step (1), inputting the cut tth frame search area image into the search branch of the twin network converged by the training in the step (4), and extracting the characteristics of the tth frame search image;
3) performing cross-correlation operation on the characteristic diagram obtained in the step 1) after multi-layer fusion and the characteristic diagram obtained in the step 2);
4) and setting T to be T +1, judging whether T is less than or equal to T, wherein T is the total frame number of the video sequence to be detected, and if T is less than or equal to T, executing the steps 2) -3), otherwise, ending the tracking process of the video sequence to be detected.
Compared with the prior art, the invention has the advantages that:
aiming at the problem that the existing twin network target tracking method does not consider the effect of a difficult sample on a model, the twin network target tracking method based on difficult sample mining is designed, the difficult sample mining is introduced into a target tracking twin network structure, a difficult negative sample is mined as training data in the training process, the difficult sample triple loss is selected as a loss function, the loss function is continuously optimized, the model is made to learn the characteristic with distinguishing capability, and the target tracking effect is better.
Specifically, in the training process, initial positive and negative samples are used for training, then the trained classifier is used for classifying the samples, the samples which are wrongly classified are used as difficult negative samples and are placed into a negative sample subset, then training is continued, and the training is repeated until the performance of the classifier is not improved any more. Different from the traditional triple training that samples are simple and easily distinguishable samples, the method selects the difficult sample triple, updates network parameters by using the difficult sample in the training process, selects the most dissimilar positive sample and the most similar negative sample of each picture to calculate the difficult triple loss, continuously excavates the difficult negative sample in the training process by optimizing the loss through the model, leads the network to be fully trained, better distinguishes similar targets, solves the problems of local change, background interference and the like in the picture, and has stronger generalization capability.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic overall flow diagram of the present invention;
FIG. 2 is a schematic diagram of a difficult sample mining strategy according to the present invention;
FIG. 3 is a graph illustrating the tracking effect of target tracking on a first video sequence using the method of the present invention;
fig. 4 shows the tracking effect of the object tracking of the second video sequence by using the method of the present invention.
Detailed Description
The invention is further described with reference to the following figures and specific embodiments.
With reference to the overall flow of the present invention shown in fig. 1, a twin network single target tracking method based on difficult sample mining includes the following steps:
and (1) constructing a training set.
According to the target position and size of the image, cutting out a target template image Z and a search area image X of all images in the image sequence training set, dividing the search area image X into a positive example image P and a negative example image N, forming a pair of positive sample pairs by the image Z and the image P, forming a pair of negative sample pairs by the image Z and the image N, and forming a training data set by (Z, P, N) triples formed by the target template image Z, the positive example image P and the negative example image N.
Specifically, the operation of step (1) includes cropping a target area template image and cropping a search area image. The target template image cutting method comprises the following steps: the target frame of the template image in target tracking is known, a square area is cut out by taking a tracked target as a center, the center position of the target area represents the target position, q pixels are respectively expanded on four sides of the target frame, and finally the size of the cut target image block is scaled to 127 multiplied by 127. The cutting method of the search area image comprises the following steps: respectively expanding 2q pixels on four sides of a target frame by taking a target area as a center, and then scaling the image block size of the cut search area to 255 multiplied by 255; where q is (w + h)/4, w is the width of the target frame, and h is the height of the target frame.
And (2) constructing a convolution twin network based on difficult sample mining to obtain characteristic diagrams of different branches.
The network comprises three branches and the three branches share the weight of the feature extraction network; the three branches are respectively used for obtaining a feature map of a target template image, a feature map of a search area positive sample image and a feature map of a negative sample image, wherein in feature extraction, a difficult sample is defined, and difficult sample mining is introduced to learn features with distinguishing capability.
Specifically, in the step (2), the feature extraction networks of different branches of the twin network are all adjusted to be ResNet-50, and the input image is subjected to feature extraction through ResNet-50.
Difficult sample mining is introduced to learn features with discriminative power. In conjunction with the difficult sample mining strategy of the present invention shown in fig. 2, in particular, the present invention considers obtaining valid pairs of difficult samples in terms of both visual feature similarity and reference contrast similarity. Image pairs possessing similar visual features and high reference contrast are defined as positive sample pairs, and image pairs possessing similar visual features and low reference contrast are defined as negative sample pairs.
The difficult samples in the dataset are defined as:
P={(i,j)|Sv(xi,xj)≥α,Sc(yi,yj)≥β}
N={(m,n)|Sv(xm,xn)≥α,Sc(ym,yn)<β}
wherein S isvRepresenting the degree of similarity of visual features, ScThe reference contrast similarity is expressed, alpha represents a threshold value of the visual feature similarity, and beta represents a threshold value of the reference contrast similarity.
The traditional triple samples three pictures from training data, which is simple, but most of the samples are simple and easily distinguished sample pairs, and if a large number of training sample pairs are simple sample pairs, the network learning is not facilitated to obtain better features. Therefore, when the method selects the pictures from the training set for training, for each picture, a most dissimilar positive sample and a most similar negative sample are selected to form a triplet, and the triplet loss of the sample difficult to calculate is calculated.
The difficult sample triplet loss is defined as:
wherein M represents M targets selected from each sample, N represents N random images of each target, (z)+Represents max (z,0), z means maxdA,P-mindA,N+ theta, theta is a threshold parameter set according to actual needs, representing the difference boundary of the positive and negative sample similarities, dA,PRepresenting the similarity of the template sample to the positive sample, dA,NRepresenting the distance of the template sample from the negative sample.
Through LhardOptimizing loss, continuously mining positive sample pairs and difficult negative samples by the model in the training process, and learning the characteristics with distinguishing capability.
And (3) performing cross-correlation operation on the target template image characteristic diagram obtained in the step (2) and the search area image characteristic diagram to obtain a response diagram, wherein the position with a higher value in the response diagram is regarded as the most similar position of the image target object, so that the position of the target is determined.
Specifically, step (3) operates as follows: after feature extraction, different-layer features are fused, the lower-layer features have more target position information, the higher-layer features have more semantic information, the higher-layer features are subjected to up-sampling operation firstly, then are fused with the lower-layer features, feature maps with different branches and multi-layer features fused are generated in an iterative mode, and the target template image feature maps are subjected to cross-correlation operation with the search area positive sample image feature map and the search area negative sample image feature map respectively to obtain response maps. And expanding the response image to the size of the original image so as to determine the position of the target on the image to be searched.
And (4) training a twin network based on difficult sample mining based on the training set in the step (1) to obtain a twin network with training convergence.
Specifically, the specific operation of step (4) is as follows:
1) training by using initial positive and negative samples, and enabling the Z-direction P to be close to and far away from N through training to obtain a trained classifier;
2) classifying the samples by using a trained classifier, putting the misclassified samples into a negative sample subset as difficult negative samples, and continuing to train the classifier;
3) and repeating the steps until the performance of the classifier is not improved any more.
And (5) performing online target tracking by using the trained twin network.
Specifically, the online tracking process in step (5) includes the steps of:
1) reading a first frame of picture of a video sequence to be tracked, acquiring the information of a boundary frame of the first frame of picture, cutting out a target template image Z of the first frame according to the method for cutting out the target template image in the step (1), inputting the Z into the template branch of the twin network converged by training in the step (4), extracting the multi-layer features of the template image, fusing, and then setting t to 2.
2) Reading the tth frame of the video to be tracked, cutting out the search area image of the tth frame according to the determined target position in the tth-1 frame and the method for cutting out the search area image in the step (1), inputting the cut-out tth frame search area image into the search branch of the convergent twin network of the training in the step (4), and extracting the characteristics of the tth frame search image.
3) Performing cross-correlation operation on the characteristic diagram obtained in the step 1) after multi-layer fusion and the characteristic diagram obtained in the step 2).
4) Setting T to be T +1, and judging whether T is equal to or less than T, wherein T is the total frame number of the video sequence to be detected; and if so, executing the steps 2) -3), otherwise, ending the tracking process of the video sequence to be detected.
Fig. 3 shows the tracking effect of the object tracking of the first video sequence by using the method of the present invention. Therefore, the target tracking method provided by the invention can effectively track the target with similar background interference.
Fig. 4 shows the tracking effect of the object tracking of the second video sequence by using the method of the present invention. Therefore, the target tracking method provided by the invention can effectively track the target with posture change and rapid movement.
In conclusion, the method introduces difficult sample mining into a target tracking twin network structure, designs difficult triple loss, can fully train the network, enhances the distinguishing capability of the classifier, can better distinguish similar targets, solves the problems of local change, background interference and the like in the image, and has stronger generalization capability on the learned model.
It is understood that the above description is not intended to limit the present invention, and the present invention is not limited to the above examples, and those skilled in the art should understand that they can make various changes, modifications, additions and substitutions within the spirit and scope of the present invention.
Claims (7)
1. A twin network single target tracking method based on difficult sample mining is characterized by comprising the following steps:
step (1), constructing a training set: cutting out a target template image Z and a search area image X of all images in an image sequence training set according to the target position and the size of the images, dividing the search area image X into a positive example image P and a negative example image N, forming a pair of positive sample pairs by the image Z and the image P, forming a pair of negative sample pairs by the image Z and the image N, and forming a training data set by (Z, P, N) triples formed by the target template image Z, the positive example image P and the negative example image N;
step (2), a convolution twin network based on difficult sample mining is constructed, wherein the network comprises three branches, and the three branches share the weight of the feature extraction network; the three branches are respectively used for obtaining a feature map of a target template image, a feature map of a search area positive sample image and a feature map of a negative sample image, wherein in feature extraction, a difficult sample is defined, and difficult sample mining is introduced to learn features with distinguishing capability;
performing cross-correlation operation on the target template image feature map obtained in the step (2) and the search area image feature map to obtain a response map, wherein the position with a higher value in the response map is determined as the most similar position of the image target object, and the response map is expanded to the size of the original image, so that the position of the target on the image to be searched is determined;
step (4), training a twin network based on difficult sample mining based on the training set in the step (1) to obtain a twin network with training convergence;
and (5) performing online target tracking by using the trained twin network.
2. The twin network single target tracking method based on difficult sample mining according to claim 1, wherein the operation of step (1) includes cropping a target area template image and cropping a search area image; the target template image cutting method comprises the following steps: the target frame of the template image in target tracking is known, a square area is cut out by taking a tracked target as a center, the center position of the target area represents the target position, q pixels are respectively expanded on four sides of the target frame, and finally the size of the cut target image block is zoomed; the cutting method of the search area image comprises the following steps: respectively expanding 2q pixels on four sides of a target frame by taking a target area as a center, and then scaling the size of the image block of the cut search area; where q is (w + h)/4, w is the width of the target frame, and h is the height of the target frame.
3. The twin network single-target tracking method based on difficult sample mining as claimed in claim 1, wherein the feature extraction networks of different branches of the twin network in step (2) are adjusted ResNet-50, and the input image is feature extracted through ResNet-50.
4. The twin network single target tracking method based on difficult sample mining as claimed in claim 1, wherein the positive sample pairs are image pairs with similar visual features and high reference contrast, and the negative sample pairs are image pairs with similar visual features and low reference contrast; the difficult samples in the dataset are defined as:
P={(i,j)|Sv(xi,xj)≥α,Sc(yi,yj)≥β}
N={(m,n)|Sv(xm,xn)≥α,Sc(ym,yn)<β}
wherein S isvRepresenting the degree of similarity of visual features, ScRepresenting the reference contrast similarity, wherein alpha represents a threshold value of the visual feature similarity, and beta represents a threshold value of the reference contrast similarity;
when selecting pictures from a training set for training, selecting a most dissimilar positive sample and a most similar negative sample to form a triple for each picture, and calculating the loss of the difficult sample triple; the difficult sample triplet loss is defined as:
wherein M represents M targets selected from each sample, N represents N random images of each target, (z)+Represents max (z,0), z means maxdA,P-mindA,N+ theta, theta is a threshold parameter set according to actual needs, representing the difference boundary of the positive and negative sample similarities, dA,PRepresenting the similarity of the template sample to the positive sample, dA,NRepresenting the distance of the template sample from the negative sample;
through LhardOptimizing loss, continuously mining positive sample pairs and difficult negative samples by the model in the training process, and learning the characteristics with distinguishing capability.
5. The twin network single target tracking method based on difficult sample mining as claimed in claim 1 or 4, wherein step (3) operates as follows: after feature extraction, fusing different-layer features, wherein the lower-layer features have more target position information and the higher-layer features have more semantic information, performing up-sampling operation on the higher-layer features, then fusing the higher-layer features with the lower-layer features, iteratively generating feature maps obtained after fusing different branches and the multilayer features, performing cross-correlation operation on the target template image feature maps and the positive sample image feature maps and the negative sample image feature maps in a search area respectively to obtain response maps, and expanding the response maps to the size of an original image, thereby determining the position of a target on an image to be searched.
6. The twin network single target tracking method based on difficult sample mining as claimed in claim 1, wherein the specific operation of step (4) is as follows:
1) training by using initial positive and negative samples, and enabling the Z-direction P to be close to and far away from N through training to obtain a trained classifier;
2) classifying the samples by using a trained classifier, putting the misclassified samples into a negative sample subset as difficult negative samples, and continuing to train the classifier;
3) and repeating the steps until the performance of the classifier is not improved any more.
7. The twin network single target tracking method based on difficult sample mining as claimed in claim 2, wherein the on-line tracking process in step (5) comprises the steps of:
1) reading a first frame of picture of a video sequence to be tracked, acquiring the information of a boundary frame of the first frame of picture, cutting out a target template image Z of the first frame according to the method for cutting out the target template image in the step (1), inputting the Z into a template branch of a twin network converged by training in the step (4), extracting and fusing multilayer features of the template image, and then setting t to be 2;
2) reading a tth frame of a video to be tracked, cutting a search area image of the tth frame according to the determined target position in the tth-1 frame and the method for cutting the search area image in the step (1), inputting the cut tth frame search area image into the search branch of the twin network converged by the training in the step (4), and extracting the characteristics of the tth frame search image;
3) performing cross-correlation operation on the characteristic diagram obtained in the step 1) after multi-layer fusion and the characteristic diagram obtained in the step 2);
4) and setting T to be T +1, judging whether T is less than or equal to T, wherein T is the total frame number of the video sequence to be detected, and if T is less than or equal to T, executing the steps 2) -3), otherwise, ending the tracking process of the video sequence to be detected.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111152770.4A CN113888595B (en) | 2021-09-29 | 2021-09-29 | Twin network single-target visual tracking method based on difficult sample mining |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111152770.4A CN113888595B (en) | 2021-09-29 | 2021-09-29 | Twin network single-target visual tracking method based on difficult sample mining |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113888595A true CN113888595A (en) | 2022-01-04 |
CN113888595B CN113888595B (en) | 2024-05-14 |
Family
ID=79008165
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111152770.4A Active CN113888595B (en) | 2021-09-29 | 2021-09-29 | Twin network single-target visual tracking method based on difficult sample mining |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113888595B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111340850A (en) * | 2020-03-20 | 2020-06-26 | 军事科学院***工程研究院***总体研究所 | Ground target tracking method of unmanned aerial vehicle based on twin network and central logic loss |
CN111354017A (en) * | 2020-03-04 | 2020-06-30 | 江南大学 | Target tracking method based on twin neural network and parallel attention module |
WO2020181685A1 (en) * | 2019-03-12 | 2020-09-17 | 南京邮电大学 | Vehicle-mounted video target detection method based on deep learning |
-
2021
- 2021-09-29 CN CN202111152770.4A patent/CN113888595B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020181685A1 (en) * | 2019-03-12 | 2020-09-17 | 南京邮电大学 | Vehicle-mounted video target detection method based on deep learning |
CN111354017A (en) * | 2020-03-04 | 2020-06-30 | 江南大学 | Target tracking method based on twin neural network and parallel attention module |
CN111340850A (en) * | 2020-03-20 | 2020-06-26 | 军事科学院***工程研究院***总体研究所 | Ground target tracking method of unmanned aerial vehicle based on twin network and central logic loss |
Non-Patent Citations (3)
Title |
---|
张博言;钟勇;: "一种基于多样性正实例的单目标跟踪算法", 哈尔滨工业大学学报, no. 10, 25 September 2020 (2020-09-25) * |
熊昌镇;李言;: "基于孪生网络的跟踪算法综述", 工业控制计算机, no. 03, 25 March 2020 (2020-03-25) * |
纪筱鹏;魏志强;: "基于轮廓特征及扩展Kalman滤波的车辆跟踪方法研究", 中国图象图形学报, no. 02, 16 February 2011 (2011-02-16) * |
Also Published As
Publication number | Publication date |
---|---|
CN113888595B (en) | 2024-05-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112184752A (en) | Video target tracking method based on pyramid convolution | |
CN110674866A (en) | Method for detecting X-ray breast lesion images by using transfer learning characteristic pyramid network | |
EP1934941B1 (en) | Bi-directional tracking using trajectory segment analysis | |
CN109086777B (en) | Saliency map refining method based on global pixel characteristics | |
CN112489081B (en) | Visual target tracking method and device | |
CN112651998B (en) | Human body tracking algorithm based on attention mechanism and double-flow multi-domain convolutional neural network | |
CN111598876B (en) | Method, system and equipment for constructing thyroid nodule automatic identification model | |
CN111368759B (en) | Monocular vision-based mobile robot semantic map construction system | |
CN108595558B (en) | Image annotation method based on data equalization strategy and multi-feature fusion | |
CN112819806A (en) | Ship weld defect detection method based on deep convolutional neural network model | |
CN113592894B (en) | Image segmentation method based on boundary box and co-occurrence feature prediction | |
CN116524197B (en) | Point cloud segmentation method, device and equipment combining edge points and depth network | |
Li et al. | Dictionary optimization and constraint neighbor embedding-based dictionary mapping for superdimension reconstruction of porous media | |
JP2022082493A (en) | Pedestrian re-identification method for random shielding recovery based on noise channel | |
CN110544267B (en) | Correlation filtering tracking method for self-adaptive selection characteristics | |
CN115564801A (en) | Attention-based single target tracking method | |
CN114861761A (en) | Loop detection method based on twin network characteristics and geometric verification | |
CN114495170A (en) | Pedestrian re-identification method and system based on local self-attention inhibition | |
CN117576753A (en) | Micro-expression recognition method based on attention feature fusion of facial key points | |
CN117351078A (en) | Target size and 6D gesture estimation method based on shape priori | |
Gong et al. | Research on an improved KCF target tracking algorithm based on CNN feature extraction | |
CN113888595B (en) | Twin network single-target visual tracking method based on difficult sample mining | |
CN108765384B (en) | Significance detection method for joint manifold sequencing and improved convex hull | |
CN115294371B (en) | Complementary feature reliable description and matching method based on deep learning | |
CN116051601A (en) | Depth space-time associated video target tracking method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |