CN111932582A - Target tracking method and device in video image - Google Patents
Target tracking method and device in video image Download PDFInfo
- Publication number
- CN111932582A CN111932582A CN202010501281.4A CN202010501281A CN111932582A CN 111932582 A CN111932582 A CN 111932582A CN 202010501281 A CN202010501281 A CN 202010501281A CN 111932582 A CN111932582 A CN 111932582A
- Authority
- CN
- China
- Prior art keywords
- target
- video frame
- tracked
- frame sequence
- target area
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 63
- 238000004364 calculation method Methods 0.000 claims abstract description 81
- 238000001514 detection method Methods 0.000 claims abstract description 29
- 230000006870 function Effects 0.000 claims description 28
- 239000013598 vector Substances 0.000 claims description 12
- 230000004807 localization Effects 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 6
- 238000002372 labelling Methods 0.000 claims description 4
- 230000009466 transformation Effects 0.000 description 4
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005286 illumination Methods 0.000 description 2
- 238000000844 transformation Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a target tracking method and a target tracking device in a video image, wherein the method comprises the following steps: performing frame splitting processing on a video to obtain a video frame sequence; detecting a target to be tracked in the video frame sequence based on a target detection network model to obtain a target area of the target to be tracked; carrying out coordinate positioning on a target area of the target to be tracked in a corresponding video frame sequence, and recording coordinate position information; performing correlation calculation on a target area of a target to be tracked in a current video frame and a target area of a target to be tracked in a previous video frame based on a video frame sequence to obtain a correlation calculation result; and acquiring the motion trail of the target to be tracked in the video frame sequence based on the correlation calculation result. In the specific implementation of the method, the target in the video image can be tracked, and the target can be accurately tracked, so that the requirements of users are met.
Description
Technical Field
The invention relates to the technical field of image processing, in particular to a target tracking method and device in a video image.
Background
The research and application of target tracking are an important branch of the computer vision field, and are widely applied to various safety monitoring technical fields, the existing tracking algorithm for the target in the video is too complex, the target needs to be subjected to complex tracking calculation in the tracking process, errors easily occur in the tracking process, the tracking precision of the target is not high easily, higher equipment needs to be configured to complete the tracking of the target, and the tracking cost is higher.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, and provides a method and a device for tracking a target in a video image, which can realize accurate tracking of the target in the video image and meet the requirements of users.
In order to solve the above technical problem, an embodiment of the present invention provides a method for tracking a target in a video image, where the method includes:
performing frame splitting processing on a video to obtain a video frame sequence;
detecting a target to be tracked in the video frame sequence based on a target detection network model to obtain a target area of the target to be tracked;
carrying out coordinate positioning on a target area of the target to be tracked in a corresponding video frame sequence, and recording coordinate position information;
performing correlation calculation on a target area of a target to be tracked in a current video frame and a target area of a target to be tracked in a previous video frame based on a video frame sequence to obtain a correlation calculation result;
and acquiring the motion trail of the target to be tracked in the video frame sequence based on the correlation calculation result.
Optionally, the video frame sequence is obtained by labeling according to a time sequence and removing redundant video frames.
Optionally, the target detection network model is a YOLOv3 network model;
the loss functions of the YOLOv3 network model include an object confidence loss function, an object class loss function, and an object localization loss function.
Optionally, before the target-detection-based network model detects the target to be tracked in the sequence of video frames, the method further includes:
and carrying out size normalization processing on the video frame sequence, and normalizing the picture size in the video frame sequence to 416 x 416.
Optionally, the performing coordinate positioning on the target region of the target to be tracked in the corresponding video frame sequence includes:
constructing pixel coordinates of the video frame sequence based on pixel points;
and acquiring the pixel coordinate position of the target area of the target to be tracked in the corresponding video frame sequence for coordinate positioning.
Optionally, the performing, based on the sequence of video frames, a correlation calculation between a target region of a target to be tracked in a current video frame and a target region of a target to be tracked in a previous video frame includes:
and performing similarity correlation calculation on a target region of a target to be tracked in the current video frame and a target region of a target to be tracked in the previous video frame based on the video frame sequence.
Optionally, the performing, based on the sequence of video frames, correlation calculation of similarity between a target region of a target to be tracked in a current video frame and a target region of a target to be tracked in a previous video frame includes:
and performing correlation calculation of SIFT feature vector similarity on a target region of a target to be tracked in the current video frame and a target region of a target to be tracked in the previous video frame based on the video frame sequence.
Optionally, the obtaining a motion trajectory of the target to be tracked in the video frame sequence based on the correlation calculation result includes:
judging whether the target area of the target to be tracked of the previous video frame appears in the target area of the target to be tracked in the current video frame or not according to the correlation calculation result;
when the judgment is carried out, respectively obtaining the coordinate positioning of the target area of the target to be tracked in the previous video frame and the target area of the target to be tracked in the current video frame;
and obtaining the motion trail of the target to be tracked in the video frame sequence according to the coordinate positioning of the target area of the target to be tracked in the previous video frame and the target area of the target to be tracked in the current video frame.
Optionally, the method further includes:
predicting the appearance position of the target to be tracked in the next video frame in the video frame sequence according to the motion track of the target to be tracked in the video frame sequence.
In addition, the embodiment of the present invention further provides an apparatus for tracking a target in a video image, the apparatus including:
a frame splitting module: the video processing device is used for carrying out frame splitting processing on a video to obtain a video frame sequence;
a detection module: the target detection network model is used for detecting a target to be tracked in the video frame sequence to obtain a target area of the target to be tracked;
a coordinate positioning module: the system is used for carrying out coordinate positioning on a target area of the target to be tracked in a corresponding video frame sequence and recording coordinate position information;
the association calculation module: the target area of the target to be tracked in the current video frame is associated with the target area of the target to be tracked in the previous video frame based on the video frame sequence, and an associated calculation result is obtained;
a motion trajectory acquisition module: the method is used for acquiring the motion trail of the target to be tracked in the video frame sequence based on the correlation calculation result.
In the embodiment of the invention, a video frame sequence is obtained by carrying out frame splitting processing on a video; detecting a target to be tracked in the video frame sequence based on a target detection network model to obtain a target area of the target to be tracked; carrying out coordinate positioning on a target area of the target to be tracked in a corresponding video frame sequence, and recording coordinate position information; performing correlation calculation on a target area of a target to be tracked in a current video frame and a target area of a target to be tracked in a previous video frame based on a video frame sequence to obtain a correlation calculation result; obtaining the motion trail of the target to be tracked in the video frame sequence based on the correlation calculation result; the target tracking method and the target tracking device can realize a simpler calculation process, can realize the target tracking without complex calculation, reduce errors in the calculation process, improve the tracking precision of the target, can perform tracking calculation on equipment with lower configuration, and reduce the tracking cost.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of a method for tracking a target in a video image according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a target tracking apparatus in a video image according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Examples
Referring to fig. 1, fig. 1 is a schematic flowchart illustrating a method for tracking a target in a video image according to an embodiment of the present invention.
As shown in fig. 1, a method for tracking a target in a video image, the method comprising:
s11: performing frame splitting processing on a video to obtain a video frame sequence;
in the specific implementation process of the invention, the video frame sequence is obtained by labeling according to the time sequence and removing redundant video frames.
Specifically, after videos acquired by the video acquisition device are acquired, the videos need to be subjected to frame splitting, a video frame sequence is specifically marked according to a time sequence, and then the video frame sequence is obtained after relevant redundant frames are removed, so that the subsequent calculation redundancy is reduced, and the calculation amount is reduced.
S12: detecting a target to be tracked in the video frame sequence based on a target detection network model to obtain a target area of the target to be tracked;
in the specific implementation process of the invention, the target detection network model is a YOLOv3 network model; the loss functions of the YOLOv3 network model include an object confidence loss function, an object class loss function, and an object localization loss function.
Further, before the target detection network model detects the target to be tracked in the sequence of video frames, the method further includes: and carrying out size normalization processing on the video frame sequence, and normalizing the picture size in the video frame sequence to 416 x 416.
Specifically, the target detection network model is a YOLOv3 network model; the loss function of the YOLOv3 network model is as follows:
L(O,o,C,c,l,g)=λ1Lconf(o,c)+λ2Lcla(O,C)+λ3Lloc(l,g);
wherein λ is1,λ2,λ3Is the equilibrium coefficient; l isconf(o, c) is a target confidence loss function; l iscal(O, C) is a target class loss function; l isloc(l, g) is the target localization loss function.
Wherein o isiE {0,1}, which represents whether the target actually exists in the predicted target boundary box i, 0 represents absence, and 1 represents existence;and (4) the Sigmoid probability of whether the target exists in the predicted target rectangular box i or not is shown.
Wherein, OijE {0,1}, which represents whether the jth class target really exists in the prediction target boundary box i, 0 represents nonexistence, and 1 represents existence;and (4) representing the Sigmoid probability of the j-th class target in the predicted target rectangular frame i.
Wherein the target location loss function uses a sum of squares of the true bias value and the predicted bias value, whereinIndicating the predicted rectangular box coordinate offset,indicating the coordinate offset between the GTbox and default frame that matches it, (b)x,by,bw,bh) Is a predicted target rectangular frame parameter; (c)x,cy,cw,ch) Default rectangular frame parameters; (g)x,gy,gw,gh) The matched real target rectangular frame parameters are obtained.
Before detecting the target to be tracked in the video frame sequence according to the standard detection network model, size normalization processing needs to be carried out on the video frame sequence, and in the invention, the picture size in the video frame sequence is normalized to 416 x 416.
By normalizing the picture size in the video frame sequence to 416 × 416, the video frame sequence more conforms to the input picture format size in the target detection network model, so that the target area of the target to be tracked is easier to detect in the video frame sequence.
S13: carrying out coordinate positioning on a target area of the target to be tracked in a corresponding video frame sequence, and recording coordinate position information;
in a specific implementation process of the present invention, the coordinate positioning of the target region of the target to be tracked in the corresponding video frame sequence includes: constructing pixel coordinates of the video frame sequence based on pixel points; and acquiring the pixel coordinate position of the target area of the target to be tracked in the corresponding video frame sequence for coordinate positioning.
Specifically, pixel points in an image of each frame in the video frame sequence are obtained, pixel coordinates of the image of each frame are constructed through the pixel points, and corresponding coordinate positions are obtained according to the pixel coordinates in the positions of the pixel coordinates of the target area of the target to be tracked in the image of the corresponding frame of the corresponding video frame sequence; by the method, the position of the pixel coordinate of the target area of the target to be tracked in the image of the corresponding frame of the corresponding video frame sequence can be accurately determined.
S14: performing correlation calculation on a target area of a target to be tracked in a current video frame and a target area of a target to be tracked in a previous video frame based on a video frame sequence to obtain a correlation calculation result;
in a specific implementation process of the present invention, the performing correlation calculation on a target region of a target to be tracked in a current video frame and a target region of a target to be tracked in a previous video frame based on a video frame sequence includes: and performing similarity correlation calculation on a target region of a target to be tracked in the current video frame and a target region of a target to be tracked in the previous video frame based on the video frame sequence.
Further, the performing, based on the sequence of video frames, correlation calculation of similarity between a target region of a target to be tracked in a current video frame and a target region of a target to be tracked in a previous video frame includes: and performing correlation calculation of SIFT feature vector similarity on a target region of a target to be tracked in the current video frame and a target region of a target to be tracked in the previous video frame based on the video frame sequence.
Specifically, according to a video frame sequence, performing correlation calculation of similarity between a target region of a target to be tracked in a current video frame and a target region of a target to be tracked in a previous video frame, and then obtaining a correlation calculation result; further adopting correlation calculation of SIFT feature vector similarity; the SIFT feature is based on some locally apparent points of interest on the object, regardless of the size and rotation of the image. The tolerance to light, noise, and slight viewing angle changes is also quite high.
The specific steps of correlation calculation of SIFT feature vector similarity are as follows: and (3) detection of extreme values in the scale space: the image locations are searched for on all scales. Identifying potential interest points invariant to scale and rotation by a gaussian differential function; key point positioning: at each candidate location, the location and scale are determined by fitting a fine model. The selection of key points depends on their degree of stability; direction determination: one or more directions are assigned to each keypoint location based on the local gradient direction of the image. All subsequent operations on the image data are transformed with respect to the orientation, scale and location of the keypoints, thereby providing invariance to these transformations; description of key points: local gradients of the image are measured at a selected scale in a neighborhood around each keypoint. These gradients are transformed into a representation that allows for relatively large local shape deformations and illumination variations.
The SIFT algorithm is characterized in that: the SIFT features are local features of the image, which keep invariance to rotation, scale scaling and brightness change and also keep a certain degree of stability to view angle change, affine transformation and noise; the uniqueness (distingness) is good, the information content is rich, and the method is suitable for quick and accurate matching in a massive characteristic database; the multiplicity, even a few objects can generate a large number of SIFT feature vectors; high speed, the optimized SIFT matching algorithm can even meet the real-time requirement; and the expandability can be conveniently combined with the feature vectors in other forms.
S15: and acquiring the motion trail of the target to be tracked in the video frame sequence based on the correlation calculation result.
In a specific implementation process of the present invention, the obtaining a motion trajectory of a target to be tracked in the sequence of video frames based on the correlation calculation result includes: judging whether the target area of the target to be tracked of the previous video frame appears in the target area of the target to be tracked in the current video frame or not according to the correlation calculation result; when the judgment is carried out, respectively obtaining the coordinate positioning of the target area of the target to be tracked in the previous video frame and the target area of the target to be tracked in the current video frame; and obtaining the motion trail of the target to be tracked in the video frame sequence according to the coordinate positioning of the target area of the target to be tracked in the previous video frame and the target area of the target to be tracked in the current video frame.
Specifically, whether a target area of a target to be tracked of a previous video frame appears in a target area of a target to be tracked in a current video frame is judged according to the correlation calculation result; when the judgment is carried out, respectively obtaining the coordinate positioning of the target area of the target to be tracked in the previous video frame and the target area of the target to be tracked in the current video frame; and obtaining the motion trail of the target to be tracked in the video frame sequence according to the coordinate positioning of the target area of the target to be tracked in the previous video frame and the target area of the target to be tracked in the current video frame.
In the specific implementation process of the invention, the method further comprises the following steps: predicting the appearance position of the target to be tracked in the next video frame in the video frame sequence according to the motion track of the target to be tracked in the video frame sequence.
Specifically, the occurrence position of the target to be tracked in the next video frame in the video frame sequence is predicted according to the motion track of the target to be tracked in the video frame sequence, so that the accurate position of the target to be tracked can be accurately predicted.
In the embodiment of the invention, a video frame sequence is obtained by carrying out frame splitting processing on a video; detecting a target to be tracked in the video frame sequence based on a target detection network model to obtain a target area of the target to be tracked; carrying out coordinate positioning on a target area of the target to be tracked in a corresponding video frame sequence, and recording coordinate position information; performing correlation calculation on a target area of a target to be tracked in a current video frame and a target area of a target to be tracked in a previous video frame based on a video frame sequence to obtain a correlation calculation result; obtaining the motion trail of the target to be tracked in the video frame sequence based on the correlation calculation result; the target tracking method and the target tracking device can realize a simpler calculation process, can realize the target tracking without complex calculation, reduce errors in the calculation process, improve the tracking precision of the target, can perform tracking calculation on equipment with lower configuration, and reduce the tracking cost.
Examples
Referring to fig. 2, fig. 2 is a schematic structural diagram of a target tracking device in a video image according to an embodiment of the present invention.
As shown in fig. 2, an apparatus for tracking an object in a video image, the apparatus comprising:
the frame splitting module 21: the video processing device is used for carrying out frame splitting processing on a video to obtain a video frame sequence;
in the specific implementation process of the invention, the video frame sequence is obtained by labeling according to the time sequence and removing redundant video frames.
Specifically, after videos acquired by the video acquisition device are acquired, the videos need to be subjected to frame splitting, a video frame sequence is specifically marked according to a time sequence, and then the video frame sequence is obtained after relevant redundant frames are removed, so that the subsequent calculation redundancy is reduced, and the calculation amount is reduced.
The detection module 22: the target detection network model is used for detecting a target to be tracked in the video frame sequence to obtain a target area of the target to be tracked;
in the specific implementation process of the invention, the target detection network model is a YOLOv3 network model; the loss functions of the YOLOv3 network model include an object confidence loss function, an object class loss function, and an object localization loss function.
Further, before the target detection network model detects the target to be tracked in the sequence of video frames, the method further includes: and carrying out size normalization processing on the video frame sequence, and normalizing the picture size in the video frame sequence to 416 x 416.
Specifically, the target detection network model is a YOLOv3 network model; the loss function of the YOLOv3 network model is as follows:
L(O,o,C,c,l,g)=λ1Lconf(o,c)+λ2Lcla(O,C)+λ3Lloc(l,g);
wherein λ is1,λ2,λ3Is the equilibrium coefficient; l isconf(o, c) is a target confidence loss function;Lcla(O, C) is a target class loss function; l isloc(l, g) is the target localization loss function.
Wherein o isiE {0,1}, which represents whether the target actually exists in the predicted target boundary box i, 0 represents absence, and 1 represents existence;and (4) the Sigmoid probability of whether the target exists in the predicted target rectangular box i or not is shown.
Wherein, OijE {0,1}, which represents whether the jth class target really exists in the prediction target boundary box i, 0 represents nonexistence, and 1 represents existence;and (4) representing the Sigmoid probability of the j-th class target in the predicted target rectangular frame i.
Wherein the target location loss function uses a sum of squares of the true bias value and the predicted bias value, whereinIndicating the predicted rectangular box coordinate offset,indicating the coordinate offset between the GTbox and default frame that matches it, (b)x,by,bw,bh) Is a predicted target rectangular frame parameter; (c)x,cy,cw,ch) Default rectangular frame parameters; (g)x,gy,gw,gh) The matched real target rectangular frame parameters are obtained.
Before detecting the target to be tracked in the video frame sequence according to the standard detection network model, size normalization processing needs to be carried out on the video frame sequence, and in the invention, the picture size in the video frame sequence is normalized to 416 x 416.
By normalizing the picture size in the video frame sequence to 416 × 416, the video frame sequence more conforms to the input picture format size in the target detection network model, so that the target area of the target to be tracked is easier to detect in the video frame sequence.
The coordinate positioning module 23: the system is used for carrying out coordinate positioning on a target area of the target to be tracked in a corresponding video frame sequence and recording coordinate position information;
in a specific implementation process of the present invention, the coordinate positioning of the target region of the target to be tracked in the corresponding video frame sequence includes: constructing pixel coordinates of the video frame sequence based on pixel points; and acquiring the pixel coordinate position of the target area of the target to be tracked in the corresponding video frame sequence for coordinate positioning.
Specifically, pixel points in an image of each frame in the video frame sequence are obtained, pixel coordinates of the image of each frame are constructed through the pixel points, and corresponding coordinate positions are obtained according to the pixel coordinates in the positions of the pixel coordinates of the target area of the target to be tracked in the image of the corresponding frame of the corresponding video frame sequence; by the method, the position of the pixel coordinate of the target area of the target to be tracked in the image of the corresponding frame of the corresponding video frame sequence can be accurately determined.
The association calculation module 24: the target area of the target to be tracked in the current video frame is associated with the target area of the target to be tracked in the previous video frame based on the video frame sequence, and an associated calculation result is obtained;
in a specific implementation process of the present invention, the performing correlation calculation on a target region of a target to be tracked in a current video frame and a target region of a target to be tracked in a previous video frame based on a video frame sequence includes: and performing similarity correlation calculation on a target region of a target to be tracked in the current video frame and a target region of a target to be tracked in the previous video frame based on the video frame sequence.
Further, the performing, based on the sequence of video frames, correlation calculation of similarity between a target region of a target to be tracked in a current video frame and a target region of a target to be tracked in a previous video frame includes: and performing correlation calculation of SIFT feature vector similarity on a target region of a target to be tracked in the current video frame and a target region of a target to be tracked in the previous video frame based on the video frame sequence.
Specifically, according to a video frame sequence, performing correlation calculation of similarity between a target region of a target to be tracked in a current video frame and a target region of a target to be tracked in a previous video frame, and then obtaining a correlation calculation result; further adopting correlation calculation of SIFT feature vector similarity; the SIFT feature is based on some locally apparent points of interest on the object, regardless of the size and rotation of the image. The tolerance to light, noise, and slight viewing angle changes is also quite high.
The specific steps of correlation calculation of SIFT feature vector similarity are as follows: and (3) detection of extreme values in the scale space: the image locations are searched for on all scales. Identifying potential interest points invariant to scale and rotation by a gaussian differential function; key point positioning: at each candidate location, the location and scale are determined by fitting a fine model. The selection of key points depends on their degree of stability; direction determination: one or more directions are assigned to each keypoint location based on the local gradient direction of the image. All subsequent operations on the image data are transformed with respect to the orientation, scale and location of the keypoints, thereby providing invariance to these transformations; description of key points: local gradients of the image are measured at a selected scale in a neighborhood around each keypoint. These gradients are transformed into a representation that allows for relatively large local shape deformations and illumination variations.
The SIFT algorithm is characterized in that: the SIFT features are local features of the image, which keep invariance to rotation, scale scaling and brightness change and also keep a certain degree of stability to view angle change, affine transformation and noise; the uniqueness (distingness) is good, the information content is rich, and the method is suitable for quick and accurate matching in a massive characteristic database; the multiplicity, even a few objects can generate a large number of SIFT feature vectors; high speed, the optimized SIFT matching algorithm can even meet the real-time requirement; and the expandability can be conveniently combined with the feature vectors in other forms.
The motion trajectory acquisition module 25: the method is used for acquiring the motion trail of the target to be tracked in the video frame sequence based on the correlation calculation result.
In a specific implementation process of the present invention, the obtaining a motion trajectory of a target to be tracked in the sequence of video frames based on the correlation calculation result includes: judging whether the target area of the target to be tracked of the previous video frame appears in the target area of the target to be tracked in the current video frame or not according to the correlation calculation result; when the judgment is carried out, respectively obtaining the coordinate positioning of the target area of the target to be tracked in the previous video frame and the target area of the target to be tracked in the current video frame; and obtaining the motion trail of the target to be tracked in the video frame sequence according to the coordinate positioning of the target area of the target to be tracked in the previous video frame and the target area of the target to be tracked in the current video frame.
Specifically, whether a target area of a target to be tracked of a previous video frame appears in a target area of a target to be tracked in a current video frame is judged according to the correlation calculation result; when the judgment is carried out, respectively obtaining the coordinate positioning of the target area of the target to be tracked in the previous video frame and the target area of the target to be tracked in the current video frame; and obtaining the motion trail of the target to be tracked in the video frame sequence according to the coordinate positioning of the target area of the target to be tracked in the previous video frame and the target area of the target to be tracked in the current video frame.
In the specific implementation process of the invention, the method further comprises the following steps: predicting the appearance position of the target to be tracked in the next video frame in the video frame sequence according to the motion track of the target to be tracked in the video frame sequence.
Specifically, the occurrence position of the target to be tracked in the next video frame in the video frame sequence is predicted according to the motion track of the target to be tracked in the video frame sequence, so that the accurate position of the target to be tracked can be accurately predicted.
In the embodiment of the invention, a video frame sequence is obtained by carrying out frame splitting processing on a video; detecting a target to be tracked in the video frame sequence based on a target detection network model to obtain a target area of the target to be tracked; carrying out coordinate positioning on a target area of the target to be tracked in a corresponding video frame sequence, and recording coordinate position information; performing correlation calculation on a target area of a target to be tracked in a current video frame and a target area of a target to be tracked in a previous video frame based on a video frame sequence to obtain a correlation calculation result; obtaining the motion trail of the target to be tracked in the video frame sequence based on the correlation calculation result; the target tracking method and the target tracking device can realize a simpler calculation process, can realize the target tracking without complex calculation, reduce errors in the calculation process, improve the tracking precision of the target, can perform tracking calculation on equipment with lower configuration, and reduce the tracking cost.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable storage medium, and the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.
In addition, the above detailed description is provided for the method and apparatus for tracking a target in a video image according to the embodiments of the present invention, and a specific example should be used herein to explain the principle and the implementation manner of the present invention, and the description of the above embodiment is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.
Claims (10)
1. A method for tracking a target in a video image, the method comprising:
performing frame splitting processing on a video to obtain a video frame sequence;
detecting a target to be tracked in the video frame sequence based on a target detection network model to obtain a target area of the target to be tracked;
carrying out coordinate positioning on a target area of the target to be tracked in a corresponding video frame sequence, and recording coordinate position information;
performing correlation calculation on a target area of a target to be tracked in a current video frame and a target area of a target to be tracked in a previous video frame based on a video frame sequence to obtain a correlation calculation result;
and acquiring the motion trail of the target to be tracked in the video frame sequence based on the correlation calculation result.
2. The object tracking method according to claim 1, wherein the sequence of video frames is obtained by labeling in chronological order and removing redundant video frames.
3. The target tracking method of claim 1, wherein the target detection network model is a YOLOv3 network model;
the loss functions of the YOLOv3 network model include an object confidence loss function, an object class loss function, and an object localization loss function.
4. The target tracking method of claim 1, wherein before the target detection network model detects the target to be tracked in the video frame sequence, the method further comprises:
and carrying out size normalization processing on the video frame sequence, and normalizing the picture size in the video frame sequence to 416 x 416.
5. The target tracking method according to claim 1, wherein the coordinate locating of the target region of the target to be tracked in the corresponding video frame sequence comprises:
constructing pixel coordinates of the video frame sequence based on pixel points;
and acquiring the pixel coordinate position of the target area of the target to be tracked in the corresponding video frame sequence for coordinate positioning.
6. The target tracking method according to claim 1, wherein the performing the correlation calculation on the target area of the target to be tracked in the current video frame and the target area of the target to be tracked in the previous video frame based on the video frame sequence comprises:
and performing similarity correlation calculation on a target region of a target to be tracked in the current video frame and a target region of a target to be tracked in the previous video frame based on the video frame sequence.
7. The target tracking method according to claim 6, wherein the performing the correlation calculation of the similarity between the target area of the target to be tracked in the current video frame and the target area of the target to be tracked in the previous video frame based on the video frame sequence comprises:
and performing correlation calculation of SIFT feature vector similarity on a target region of a target to be tracked in the current video frame and a target region of a target to be tracked in the previous video frame based on the video frame sequence.
8. The target tracking method according to claim 1, wherein the obtaining of the motion trajectory of the target to be tracked in the video frame sequence based on the correlation calculation result comprises:
judging whether the target area of the target to be tracked of the previous video frame appears in the target area of the target to be tracked in the current video frame or not according to the correlation calculation result;
when the judgment is carried out, respectively obtaining the coordinate positioning of the target area of the target to be tracked in the previous video frame and the target area of the target to be tracked in the current video frame;
and obtaining the motion trail of the target to be tracked in the video frame sequence according to the coordinate positioning of the target area of the target to be tracked in the previous video frame and the target area of the target to be tracked in the current video frame.
9. The target tracking method of any one of claims 1 to 8, further comprising:
predicting the appearance position of the target to be tracked in the next video frame in the video frame sequence according to the motion track of the target to be tracked in the video frame sequence.
10. An apparatus for tracking objects in video images, the apparatus comprising:
a frame splitting module: the video processing device is used for carrying out frame splitting processing on a video to obtain a video frame sequence;
a detection module: the target detection network model is used for detecting a target to be tracked in the video frame sequence to obtain a target area of the target to be tracked;
a coordinate positioning module: the system is used for carrying out coordinate positioning on a target area of the target to be tracked in a corresponding video frame sequence and recording coordinate position information;
the association calculation module: the target area of the target to be tracked in the current video frame is associated with the target area of the target to be tracked in the previous video frame based on the video frame sequence, and an associated calculation result is obtained;
a motion trajectory acquisition module: the method is used for acquiring the motion trail of the target to be tracked in the video frame sequence based on the correlation calculation result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010501281.4A CN111932582A (en) | 2020-06-04 | 2020-06-04 | Target tracking method and device in video image |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010501281.4A CN111932582A (en) | 2020-06-04 | 2020-06-04 | Target tracking method and device in video image |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111932582A true CN111932582A (en) | 2020-11-13 |
Family
ID=73317763
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010501281.4A Pending CN111932582A (en) | 2020-06-04 | 2020-06-04 | Target tracking method and device in video image |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111932582A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112418136A (en) * | 2020-12-02 | 2021-02-26 | 云南电网有限责任公司电力科学研究院 | Target area detection tracking method and device for field operating personnel |
CN112907622A (en) * | 2021-01-20 | 2021-06-04 | 厦门市七星通联科技有限公司 | Method, device, equipment and storage medium for identifying track of target object in video |
CN113343795A (en) * | 2021-05-24 | 2021-09-03 | 广州智慧城市发展研究院 | Target associated video tracking processing method and device |
CN113849687A (en) * | 2020-11-23 | 2021-12-28 | 阿里巴巴集团控股有限公司 | Video processing method and device |
CN114511591A (en) * | 2021-12-31 | 2022-05-17 | 中国科学院自动化研究所 | Trajectory tracking method and device, electronic equipment and storage medium |
-
2020
- 2020-06-04 CN CN202010501281.4A patent/CN111932582A/en active Pending
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113849687A (en) * | 2020-11-23 | 2021-12-28 | 阿里巴巴集团控股有限公司 | Video processing method and device |
CN113849687B (en) * | 2020-11-23 | 2022-10-28 | 阿里巴巴集团控股有限公司 | Video processing method and device |
CN112418136A (en) * | 2020-12-02 | 2021-02-26 | 云南电网有限责任公司电力科学研究院 | Target area detection tracking method and device for field operating personnel |
CN112418136B (en) * | 2020-12-02 | 2023-11-24 | 云南电网有限责任公司电力科学研究院 | Method and device for detecting and tracking target area of field operator |
CN112907622A (en) * | 2021-01-20 | 2021-06-04 | 厦门市七星通联科技有限公司 | Method, device, equipment and storage medium for identifying track of target object in video |
CN113343795A (en) * | 2021-05-24 | 2021-09-03 | 广州智慧城市发展研究院 | Target associated video tracking processing method and device |
CN113343795B (en) * | 2021-05-24 | 2024-04-26 | 广州智慧城市发展研究院 | Target associated video tracking processing method |
CN114511591A (en) * | 2021-12-31 | 2022-05-17 | 中国科学院自动化研究所 | Trajectory tracking method and device, electronic equipment and storage medium |
CN114511591B (en) * | 2021-12-31 | 2023-08-04 | 中国科学院自动化研究所 | Track tracking method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111932582A (en) | Target tracking method and device in video image | |
CN111932579A (en) | Method and device for adjusting equipment angle based on motion trail of tracked target | |
Tian et al. | Scene Text Detection in Video by Learning Locally and Globally. | |
Molina-Moreno et al. | Efficient scale-adaptive license plate detection system | |
CN107240130B (en) | Remote sensing image registration method, device and system | |
Wang et al. | An improved ORB image feature matching algorithm based on SURF | |
US11830218B2 (en) | Visual-inertial localisation in an existing map | |
Son et al. | A multi-vision sensor-based fast localization system with image matching for challenging outdoor environments | |
CN115375917B (en) | Target edge feature extraction method, device, terminal and storage medium | |
CN113298146A (en) | Image matching method, device, equipment and medium based on feature detection | |
CN111950370A (en) | Dynamic environment offline visual milemeter expansion method | |
Chowdhary et al. | Video surveillance for the crime detection using features | |
CN111105436B (en) | Target tracking method, computer device and storage medium | |
CN114494373A (en) | High-precision rail alignment method and system based on target detection and image registration | |
Gao et al. | Occluded person re-identification based on feature fusion and sparse reconstruction | |
Guler et al. | A new object tracking framework for interest point based feature extraction algorithms | |
Li et al. | TextSLAM: Visual SLAM With Semantic Planar Text Features | |
CN113129332A (en) | Method and apparatus for performing target object tracking | |
Jabnoun et al. | Visual scene prediction for blind people based on object recognition | |
Sun et al. | Convolutional neural network-based coarse initial position estimation of a monocular camera in large-scale 3D light detection and ranging maps | |
Li et al. | A novel automatic image stitching algorithm for ceramic microscopic images | |
CN111401286A (en) | Pedestrian retrieval method based on component weight generation network | |
Aswin et al. | Stereo-Vision Based System For Object Detection And Recognition | |
Yang et al. | Image copy–move forgery detection based on sped-up robust features descriptor and adaptive minimal–maximal suppression | |
Kalaiyarasi et al. | Enhancing logo matching and recognition using local features |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20201113 |