CN111932582A - Target tracking method and device in video image - Google Patents

Target tracking method and device in video image Download PDF

Info

Publication number
CN111932582A
CN111932582A CN202010501281.4A CN202010501281A CN111932582A CN 111932582 A CN111932582 A CN 111932582A CN 202010501281 A CN202010501281 A CN 202010501281A CN 111932582 A CN111932582 A CN 111932582A
Authority
CN
China
Prior art keywords
target
video frame
tracked
frame sequence
target area
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010501281.4A
Other languages
Chinese (zh)
Inventor
詹瑾
郑伟俊
谢桂园
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Polytechnic Normal University
Original Assignee
Guangdong Polytechnic Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Polytechnic Normal University filed Critical Guangdong Polytechnic Normal University
Priority to CN202010501281.4A priority Critical patent/CN111932582A/en
Publication of CN111932582A publication Critical patent/CN111932582A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a target tracking method and a target tracking device in a video image, wherein the method comprises the following steps: performing frame splitting processing on a video to obtain a video frame sequence; detecting a target to be tracked in the video frame sequence based on a target detection network model to obtain a target area of the target to be tracked; carrying out coordinate positioning on a target area of the target to be tracked in a corresponding video frame sequence, and recording coordinate position information; performing correlation calculation on a target area of a target to be tracked in a current video frame and a target area of a target to be tracked in a previous video frame based on a video frame sequence to obtain a correlation calculation result; and acquiring the motion trail of the target to be tracked in the video frame sequence based on the correlation calculation result. In the specific implementation of the method, the target in the video image can be tracked, and the target can be accurately tracked, so that the requirements of users are met.

Description

Target tracking method and device in video image
Technical Field
The invention relates to the technical field of image processing, in particular to a target tracking method and device in a video image.
Background
The research and application of target tracking are an important branch of the computer vision field, and are widely applied to various safety monitoring technical fields, the existing tracking algorithm for the target in the video is too complex, the target needs to be subjected to complex tracking calculation in the tracking process, errors easily occur in the tracking process, the tracking precision of the target is not high easily, higher equipment needs to be configured to complete the tracking of the target, and the tracking cost is higher.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, and provides a method and a device for tracking a target in a video image, which can realize accurate tracking of the target in the video image and meet the requirements of users.
In order to solve the above technical problem, an embodiment of the present invention provides a method for tracking a target in a video image, where the method includes:
performing frame splitting processing on a video to obtain a video frame sequence;
detecting a target to be tracked in the video frame sequence based on a target detection network model to obtain a target area of the target to be tracked;
carrying out coordinate positioning on a target area of the target to be tracked in a corresponding video frame sequence, and recording coordinate position information;
performing correlation calculation on a target area of a target to be tracked in a current video frame and a target area of a target to be tracked in a previous video frame based on a video frame sequence to obtain a correlation calculation result;
and acquiring the motion trail of the target to be tracked in the video frame sequence based on the correlation calculation result.
Optionally, the video frame sequence is obtained by labeling according to a time sequence and removing redundant video frames.
Optionally, the target detection network model is a YOLOv3 network model;
the loss functions of the YOLOv3 network model include an object confidence loss function, an object class loss function, and an object localization loss function.
Optionally, before the target-detection-based network model detects the target to be tracked in the sequence of video frames, the method further includes:
and carrying out size normalization processing on the video frame sequence, and normalizing the picture size in the video frame sequence to 416 x 416.
Optionally, the performing coordinate positioning on the target region of the target to be tracked in the corresponding video frame sequence includes:
constructing pixel coordinates of the video frame sequence based on pixel points;
and acquiring the pixel coordinate position of the target area of the target to be tracked in the corresponding video frame sequence for coordinate positioning.
Optionally, the performing, based on the sequence of video frames, a correlation calculation between a target region of a target to be tracked in a current video frame and a target region of a target to be tracked in a previous video frame includes:
and performing similarity correlation calculation on a target region of a target to be tracked in the current video frame and a target region of a target to be tracked in the previous video frame based on the video frame sequence.
Optionally, the performing, based on the sequence of video frames, correlation calculation of similarity between a target region of a target to be tracked in a current video frame and a target region of a target to be tracked in a previous video frame includes:
and performing correlation calculation of SIFT feature vector similarity on a target region of a target to be tracked in the current video frame and a target region of a target to be tracked in the previous video frame based on the video frame sequence.
Optionally, the obtaining a motion trajectory of the target to be tracked in the video frame sequence based on the correlation calculation result includes:
judging whether the target area of the target to be tracked of the previous video frame appears in the target area of the target to be tracked in the current video frame or not according to the correlation calculation result;
when the judgment is carried out, respectively obtaining the coordinate positioning of the target area of the target to be tracked in the previous video frame and the target area of the target to be tracked in the current video frame;
and obtaining the motion trail of the target to be tracked in the video frame sequence according to the coordinate positioning of the target area of the target to be tracked in the previous video frame and the target area of the target to be tracked in the current video frame.
Optionally, the method further includes:
predicting the appearance position of the target to be tracked in the next video frame in the video frame sequence according to the motion track of the target to be tracked in the video frame sequence.
In addition, the embodiment of the present invention further provides an apparatus for tracking a target in a video image, the apparatus including:
a frame splitting module: the video processing device is used for carrying out frame splitting processing on a video to obtain a video frame sequence;
a detection module: the target detection network model is used for detecting a target to be tracked in the video frame sequence to obtain a target area of the target to be tracked;
a coordinate positioning module: the system is used for carrying out coordinate positioning on a target area of the target to be tracked in a corresponding video frame sequence and recording coordinate position information;
the association calculation module: the target area of the target to be tracked in the current video frame is associated with the target area of the target to be tracked in the previous video frame based on the video frame sequence, and an associated calculation result is obtained;
a motion trajectory acquisition module: the method is used for acquiring the motion trail of the target to be tracked in the video frame sequence based on the correlation calculation result.
In the embodiment of the invention, a video frame sequence is obtained by carrying out frame splitting processing on a video; detecting a target to be tracked in the video frame sequence based on a target detection network model to obtain a target area of the target to be tracked; carrying out coordinate positioning on a target area of the target to be tracked in a corresponding video frame sequence, and recording coordinate position information; performing correlation calculation on a target area of a target to be tracked in a current video frame and a target area of a target to be tracked in a previous video frame based on a video frame sequence to obtain a correlation calculation result; obtaining the motion trail of the target to be tracked in the video frame sequence based on the correlation calculation result; the target tracking method and the target tracking device can realize a simpler calculation process, can realize the target tracking without complex calculation, reduce errors in the calculation process, improve the tracking precision of the target, can perform tracking calculation on equipment with lower configuration, and reduce the tracking cost.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of a method for tracking a target in a video image according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a target tracking apparatus in a video image according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Examples
Referring to fig. 1, fig. 1 is a schematic flowchart illustrating a method for tracking a target in a video image according to an embodiment of the present invention.
As shown in fig. 1, a method for tracking a target in a video image, the method comprising:
s11: performing frame splitting processing on a video to obtain a video frame sequence;
in the specific implementation process of the invention, the video frame sequence is obtained by labeling according to the time sequence and removing redundant video frames.
Specifically, after videos acquired by the video acquisition device are acquired, the videos need to be subjected to frame splitting, a video frame sequence is specifically marked according to a time sequence, and then the video frame sequence is obtained after relevant redundant frames are removed, so that the subsequent calculation redundancy is reduced, and the calculation amount is reduced.
S12: detecting a target to be tracked in the video frame sequence based on a target detection network model to obtain a target area of the target to be tracked;
in the specific implementation process of the invention, the target detection network model is a YOLOv3 network model; the loss functions of the YOLOv3 network model include an object confidence loss function, an object class loss function, and an object localization loss function.
Further, before the target detection network model detects the target to be tracked in the sequence of video frames, the method further includes: and carrying out size normalization processing on the video frame sequence, and normalizing the picture size in the video frame sequence to 416 x 416.
Specifically, the target detection network model is a YOLOv3 network model; the loss function of the YOLOv3 network model is as follows:
L(O,o,C,c,l,g)=λ1Lconf(o,c)+λ2Lcla(O,C)+λ3Lloc(l,g);
wherein λ is1,λ2,λ3Is the equilibrium coefficient; l isconf(o, c) is a target confidence loss function; l iscal(O, C) is a target class loss function; l isloc(l, g) is the target localization loss function.
Figure BDA0002524905810000051
Figure BDA0002524905810000052
Wherein o isiE {0,1}, which represents whether the target actually exists in the predicted target boundary box i, 0 represents absence, and 1 represents existence;
Figure BDA0002524905810000053
and (4) the Sigmoid probability of whether the target exists in the predicted target rectangular box i or not is shown.
Figure BDA0002524905810000054
Figure BDA0002524905810000055
Wherein, OijE {0,1}, which represents whether the jth class target really exists in the prediction target boundary box i, 0 represents nonexistence, and 1 represents existence;
Figure BDA0002524905810000056
and (4) representing the Sigmoid probability of the j-th class target in the predicted target rectangular frame i.
Figure BDA0002524905810000057
Figure BDA0002524905810000058
Figure BDA0002524905810000059
Figure BDA00025249058100000510
Figure BDA00025249058100000511
Wherein the target location loss function uses a sum of squares of the true bias value and the predicted bias value, wherein
Figure BDA00025249058100000512
Indicating the predicted rectangular box coordinate offset,
Figure BDA00025249058100000513
indicating the coordinate offset between the GTbox and default frame that matches it, (b)x,by,bw,bh) Is a predicted target rectangular frame parameter; (c)x,cy,cw,ch) Default rectangular frame parameters; (g)x,gy,gw,gh) The matched real target rectangular frame parameters are obtained.
Before detecting the target to be tracked in the video frame sequence according to the standard detection network model, size normalization processing needs to be carried out on the video frame sequence, and in the invention, the picture size in the video frame sequence is normalized to 416 x 416.
By normalizing the picture size in the video frame sequence to 416 × 416, the video frame sequence more conforms to the input picture format size in the target detection network model, so that the target area of the target to be tracked is easier to detect in the video frame sequence.
S13: carrying out coordinate positioning on a target area of the target to be tracked in a corresponding video frame sequence, and recording coordinate position information;
in a specific implementation process of the present invention, the coordinate positioning of the target region of the target to be tracked in the corresponding video frame sequence includes: constructing pixel coordinates of the video frame sequence based on pixel points; and acquiring the pixel coordinate position of the target area of the target to be tracked in the corresponding video frame sequence for coordinate positioning.
Specifically, pixel points in an image of each frame in the video frame sequence are obtained, pixel coordinates of the image of each frame are constructed through the pixel points, and corresponding coordinate positions are obtained according to the pixel coordinates in the positions of the pixel coordinates of the target area of the target to be tracked in the image of the corresponding frame of the corresponding video frame sequence; by the method, the position of the pixel coordinate of the target area of the target to be tracked in the image of the corresponding frame of the corresponding video frame sequence can be accurately determined.
S14: performing correlation calculation on a target area of a target to be tracked in a current video frame and a target area of a target to be tracked in a previous video frame based on a video frame sequence to obtain a correlation calculation result;
in a specific implementation process of the present invention, the performing correlation calculation on a target region of a target to be tracked in a current video frame and a target region of a target to be tracked in a previous video frame based on a video frame sequence includes: and performing similarity correlation calculation on a target region of a target to be tracked in the current video frame and a target region of a target to be tracked in the previous video frame based on the video frame sequence.
Further, the performing, based on the sequence of video frames, correlation calculation of similarity between a target region of a target to be tracked in a current video frame and a target region of a target to be tracked in a previous video frame includes: and performing correlation calculation of SIFT feature vector similarity on a target region of a target to be tracked in the current video frame and a target region of a target to be tracked in the previous video frame based on the video frame sequence.
Specifically, according to a video frame sequence, performing correlation calculation of similarity between a target region of a target to be tracked in a current video frame and a target region of a target to be tracked in a previous video frame, and then obtaining a correlation calculation result; further adopting correlation calculation of SIFT feature vector similarity; the SIFT feature is based on some locally apparent points of interest on the object, regardless of the size and rotation of the image. The tolerance to light, noise, and slight viewing angle changes is also quite high.
The specific steps of correlation calculation of SIFT feature vector similarity are as follows: and (3) detection of extreme values in the scale space: the image locations are searched for on all scales. Identifying potential interest points invariant to scale and rotation by a gaussian differential function; key point positioning: at each candidate location, the location and scale are determined by fitting a fine model. The selection of key points depends on their degree of stability; direction determination: one or more directions are assigned to each keypoint location based on the local gradient direction of the image. All subsequent operations on the image data are transformed with respect to the orientation, scale and location of the keypoints, thereby providing invariance to these transformations; description of key points: local gradients of the image are measured at a selected scale in a neighborhood around each keypoint. These gradients are transformed into a representation that allows for relatively large local shape deformations and illumination variations.
The SIFT algorithm is characterized in that: the SIFT features are local features of the image, which keep invariance to rotation, scale scaling and brightness change and also keep a certain degree of stability to view angle change, affine transformation and noise; the uniqueness (distingness) is good, the information content is rich, and the method is suitable for quick and accurate matching in a massive characteristic database; the multiplicity, even a few objects can generate a large number of SIFT feature vectors; high speed, the optimized SIFT matching algorithm can even meet the real-time requirement; and the expandability can be conveniently combined with the feature vectors in other forms.
S15: and acquiring the motion trail of the target to be tracked in the video frame sequence based on the correlation calculation result.
In a specific implementation process of the present invention, the obtaining a motion trajectory of a target to be tracked in the sequence of video frames based on the correlation calculation result includes: judging whether the target area of the target to be tracked of the previous video frame appears in the target area of the target to be tracked in the current video frame or not according to the correlation calculation result; when the judgment is carried out, respectively obtaining the coordinate positioning of the target area of the target to be tracked in the previous video frame and the target area of the target to be tracked in the current video frame; and obtaining the motion trail of the target to be tracked in the video frame sequence according to the coordinate positioning of the target area of the target to be tracked in the previous video frame and the target area of the target to be tracked in the current video frame.
Specifically, whether a target area of a target to be tracked of a previous video frame appears in a target area of a target to be tracked in a current video frame is judged according to the correlation calculation result; when the judgment is carried out, respectively obtaining the coordinate positioning of the target area of the target to be tracked in the previous video frame and the target area of the target to be tracked in the current video frame; and obtaining the motion trail of the target to be tracked in the video frame sequence according to the coordinate positioning of the target area of the target to be tracked in the previous video frame and the target area of the target to be tracked in the current video frame.
In the specific implementation process of the invention, the method further comprises the following steps: predicting the appearance position of the target to be tracked in the next video frame in the video frame sequence according to the motion track of the target to be tracked in the video frame sequence.
Specifically, the occurrence position of the target to be tracked in the next video frame in the video frame sequence is predicted according to the motion track of the target to be tracked in the video frame sequence, so that the accurate position of the target to be tracked can be accurately predicted.
In the embodiment of the invention, a video frame sequence is obtained by carrying out frame splitting processing on a video; detecting a target to be tracked in the video frame sequence based on a target detection network model to obtain a target area of the target to be tracked; carrying out coordinate positioning on a target area of the target to be tracked in a corresponding video frame sequence, and recording coordinate position information; performing correlation calculation on a target area of a target to be tracked in a current video frame and a target area of a target to be tracked in a previous video frame based on a video frame sequence to obtain a correlation calculation result; obtaining the motion trail of the target to be tracked in the video frame sequence based on the correlation calculation result; the target tracking method and the target tracking device can realize a simpler calculation process, can realize the target tracking without complex calculation, reduce errors in the calculation process, improve the tracking precision of the target, can perform tracking calculation on equipment with lower configuration, and reduce the tracking cost.
Examples
Referring to fig. 2, fig. 2 is a schematic structural diagram of a target tracking device in a video image according to an embodiment of the present invention.
As shown in fig. 2, an apparatus for tracking an object in a video image, the apparatus comprising:
the frame splitting module 21: the video processing device is used for carrying out frame splitting processing on a video to obtain a video frame sequence;
in the specific implementation process of the invention, the video frame sequence is obtained by labeling according to the time sequence and removing redundant video frames.
Specifically, after videos acquired by the video acquisition device are acquired, the videos need to be subjected to frame splitting, a video frame sequence is specifically marked according to a time sequence, and then the video frame sequence is obtained after relevant redundant frames are removed, so that the subsequent calculation redundancy is reduced, and the calculation amount is reduced.
The detection module 22: the target detection network model is used for detecting a target to be tracked in the video frame sequence to obtain a target area of the target to be tracked;
in the specific implementation process of the invention, the target detection network model is a YOLOv3 network model; the loss functions of the YOLOv3 network model include an object confidence loss function, an object class loss function, and an object localization loss function.
Further, before the target detection network model detects the target to be tracked in the sequence of video frames, the method further includes: and carrying out size normalization processing on the video frame sequence, and normalizing the picture size in the video frame sequence to 416 x 416.
Specifically, the target detection network model is a YOLOv3 network model; the loss function of the YOLOv3 network model is as follows:
L(O,o,C,c,l,g)=λ1Lconf(o,c)+λ2Lcla(O,C)+λ3Lloc(l,g);
wherein λ is1,λ2,λ3Is the equilibrium coefficient; l isconf(o, c) is a target confidence loss function;Lcla(O, C) is a target class loss function; l isloc(l, g) is the target localization loss function.
Figure BDA0002524905810000091
Figure BDA0002524905810000092
Wherein o isiE {0,1}, which represents whether the target actually exists in the predicted target boundary box i, 0 represents absence, and 1 represents existence;
Figure BDA0002524905810000093
and (4) the Sigmoid probability of whether the target exists in the predicted target rectangular box i or not is shown.
Figure BDA0002524905810000094
Figure BDA0002524905810000095
Wherein, OijE {0,1}, which represents whether the jth class target really exists in the prediction target boundary box i, 0 represents nonexistence, and 1 represents existence;
Figure BDA0002524905810000096
and (4) representing the Sigmoid probability of the j-th class target in the predicted target rectangular frame i.
Figure BDA0002524905810000097
Figure BDA0002524905810000098
Figure BDA0002524905810000099
Figure BDA00025249058100000910
Figure BDA00025249058100000911
Wherein the target location loss function uses a sum of squares of the true bias value and the predicted bias value, wherein
Figure BDA0002524905810000101
Indicating the predicted rectangular box coordinate offset,
Figure BDA0002524905810000102
indicating the coordinate offset between the GTbox and default frame that matches it, (b)x,by,bw,bh) Is a predicted target rectangular frame parameter; (c)x,cy,cw,ch) Default rectangular frame parameters; (g)x,gy,gw,gh) The matched real target rectangular frame parameters are obtained.
Before detecting the target to be tracked in the video frame sequence according to the standard detection network model, size normalization processing needs to be carried out on the video frame sequence, and in the invention, the picture size in the video frame sequence is normalized to 416 x 416.
By normalizing the picture size in the video frame sequence to 416 × 416, the video frame sequence more conforms to the input picture format size in the target detection network model, so that the target area of the target to be tracked is easier to detect in the video frame sequence.
The coordinate positioning module 23: the system is used for carrying out coordinate positioning on a target area of the target to be tracked in a corresponding video frame sequence and recording coordinate position information;
in a specific implementation process of the present invention, the coordinate positioning of the target region of the target to be tracked in the corresponding video frame sequence includes: constructing pixel coordinates of the video frame sequence based on pixel points; and acquiring the pixel coordinate position of the target area of the target to be tracked in the corresponding video frame sequence for coordinate positioning.
Specifically, pixel points in an image of each frame in the video frame sequence are obtained, pixel coordinates of the image of each frame are constructed through the pixel points, and corresponding coordinate positions are obtained according to the pixel coordinates in the positions of the pixel coordinates of the target area of the target to be tracked in the image of the corresponding frame of the corresponding video frame sequence; by the method, the position of the pixel coordinate of the target area of the target to be tracked in the image of the corresponding frame of the corresponding video frame sequence can be accurately determined.
The association calculation module 24: the target area of the target to be tracked in the current video frame is associated with the target area of the target to be tracked in the previous video frame based on the video frame sequence, and an associated calculation result is obtained;
in a specific implementation process of the present invention, the performing correlation calculation on a target region of a target to be tracked in a current video frame and a target region of a target to be tracked in a previous video frame based on a video frame sequence includes: and performing similarity correlation calculation on a target region of a target to be tracked in the current video frame and a target region of a target to be tracked in the previous video frame based on the video frame sequence.
Further, the performing, based on the sequence of video frames, correlation calculation of similarity between a target region of a target to be tracked in a current video frame and a target region of a target to be tracked in a previous video frame includes: and performing correlation calculation of SIFT feature vector similarity on a target region of a target to be tracked in the current video frame and a target region of a target to be tracked in the previous video frame based on the video frame sequence.
Specifically, according to a video frame sequence, performing correlation calculation of similarity between a target region of a target to be tracked in a current video frame and a target region of a target to be tracked in a previous video frame, and then obtaining a correlation calculation result; further adopting correlation calculation of SIFT feature vector similarity; the SIFT feature is based on some locally apparent points of interest on the object, regardless of the size and rotation of the image. The tolerance to light, noise, and slight viewing angle changes is also quite high.
The specific steps of correlation calculation of SIFT feature vector similarity are as follows: and (3) detection of extreme values in the scale space: the image locations are searched for on all scales. Identifying potential interest points invariant to scale and rotation by a gaussian differential function; key point positioning: at each candidate location, the location and scale are determined by fitting a fine model. The selection of key points depends on their degree of stability; direction determination: one or more directions are assigned to each keypoint location based on the local gradient direction of the image. All subsequent operations on the image data are transformed with respect to the orientation, scale and location of the keypoints, thereby providing invariance to these transformations; description of key points: local gradients of the image are measured at a selected scale in a neighborhood around each keypoint. These gradients are transformed into a representation that allows for relatively large local shape deformations and illumination variations.
The SIFT algorithm is characterized in that: the SIFT features are local features of the image, which keep invariance to rotation, scale scaling and brightness change and also keep a certain degree of stability to view angle change, affine transformation and noise; the uniqueness (distingness) is good, the information content is rich, and the method is suitable for quick and accurate matching in a massive characteristic database; the multiplicity, even a few objects can generate a large number of SIFT feature vectors; high speed, the optimized SIFT matching algorithm can even meet the real-time requirement; and the expandability can be conveniently combined with the feature vectors in other forms.
The motion trajectory acquisition module 25: the method is used for acquiring the motion trail of the target to be tracked in the video frame sequence based on the correlation calculation result.
In a specific implementation process of the present invention, the obtaining a motion trajectory of a target to be tracked in the sequence of video frames based on the correlation calculation result includes: judging whether the target area of the target to be tracked of the previous video frame appears in the target area of the target to be tracked in the current video frame or not according to the correlation calculation result; when the judgment is carried out, respectively obtaining the coordinate positioning of the target area of the target to be tracked in the previous video frame and the target area of the target to be tracked in the current video frame; and obtaining the motion trail of the target to be tracked in the video frame sequence according to the coordinate positioning of the target area of the target to be tracked in the previous video frame and the target area of the target to be tracked in the current video frame.
Specifically, whether a target area of a target to be tracked of a previous video frame appears in a target area of a target to be tracked in a current video frame is judged according to the correlation calculation result; when the judgment is carried out, respectively obtaining the coordinate positioning of the target area of the target to be tracked in the previous video frame and the target area of the target to be tracked in the current video frame; and obtaining the motion trail of the target to be tracked in the video frame sequence according to the coordinate positioning of the target area of the target to be tracked in the previous video frame and the target area of the target to be tracked in the current video frame.
In the specific implementation process of the invention, the method further comprises the following steps: predicting the appearance position of the target to be tracked in the next video frame in the video frame sequence according to the motion track of the target to be tracked in the video frame sequence.
Specifically, the occurrence position of the target to be tracked in the next video frame in the video frame sequence is predicted according to the motion track of the target to be tracked in the video frame sequence, so that the accurate position of the target to be tracked can be accurately predicted.
In the embodiment of the invention, a video frame sequence is obtained by carrying out frame splitting processing on a video; detecting a target to be tracked in the video frame sequence based on a target detection network model to obtain a target area of the target to be tracked; carrying out coordinate positioning on a target area of the target to be tracked in a corresponding video frame sequence, and recording coordinate position information; performing correlation calculation on a target area of a target to be tracked in a current video frame and a target area of a target to be tracked in a previous video frame based on a video frame sequence to obtain a correlation calculation result; obtaining the motion trail of the target to be tracked in the video frame sequence based on the correlation calculation result; the target tracking method and the target tracking device can realize a simpler calculation process, can realize the target tracking without complex calculation, reduce errors in the calculation process, improve the tracking precision of the target, can perform tracking calculation on equipment with lower configuration, and reduce the tracking cost.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable storage medium, and the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.
In addition, the above detailed description is provided for the method and apparatus for tracking a target in a video image according to the embodiments of the present invention, and a specific example should be used herein to explain the principle and the implementation manner of the present invention, and the description of the above embodiment is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1. A method for tracking a target in a video image, the method comprising:
performing frame splitting processing on a video to obtain a video frame sequence;
detecting a target to be tracked in the video frame sequence based on a target detection network model to obtain a target area of the target to be tracked;
carrying out coordinate positioning on a target area of the target to be tracked in a corresponding video frame sequence, and recording coordinate position information;
performing correlation calculation on a target area of a target to be tracked in a current video frame and a target area of a target to be tracked in a previous video frame based on a video frame sequence to obtain a correlation calculation result;
and acquiring the motion trail of the target to be tracked in the video frame sequence based on the correlation calculation result.
2. The object tracking method according to claim 1, wherein the sequence of video frames is obtained by labeling in chronological order and removing redundant video frames.
3. The target tracking method of claim 1, wherein the target detection network model is a YOLOv3 network model;
the loss functions of the YOLOv3 network model include an object confidence loss function, an object class loss function, and an object localization loss function.
4. The target tracking method of claim 1, wherein before the target detection network model detects the target to be tracked in the video frame sequence, the method further comprises:
and carrying out size normalization processing on the video frame sequence, and normalizing the picture size in the video frame sequence to 416 x 416.
5. The target tracking method according to claim 1, wherein the coordinate locating of the target region of the target to be tracked in the corresponding video frame sequence comprises:
constructing pixel coordinates of the video frame sequence based on pixel points;
and acquiring the pixel coordinate position of the target area of the target to be tracked in the corresponding video frame sequence for coordinate positioning.
6. The target tracking method according to claim 1, wherein the performing the correlation calculation on the target area of the target to be tracked in the current video frame and the target area of the target to be tracked in the previous video frame based on the video frame sequence comprises:
and performing similarity correlation calculation on a target region of a target to be tracked in the current video frame and a target region of a target to be tracked in the previous video frame based on the video frame sequence.
7. The target tracking method according to claim 6, wherein the performing the correlation calculation of the similarity between the target area of the target to be tracked in the current video frame and the target area of the target to be tracked in the previous video frame based on the video frame sequence comprises:
and performing correlation calculation of SIFT feature vector similarity on a target region of a target to be tracked in the current video frame and a target region of a target to be tracked in the previous video frame based on the video frame sequence.
8. The target tracking method according to claim 1, wherein the obtaining of the motion trajectory of the target to be tracked in the video frame sequence based on the correlation calculation result comprises:
judging whether the target area of the target to be tracked of the previous video frame appears in the target area of the target to be tracked in the current video frame or not according to the correlation calculation result;
when the judgment is carried out, respectively obtaining the coordinate positioning of the target area of the target to be tracked in the previous video frame and the target area of the target to be tracked in the current video frame;
and obtaining the motion trail of the target to be tracked in the video frame sequence according to the coordinate positioning of the target area of the target to be tracked in the previous video frame and the target area of the target to be tracked in the current video frame.
9. The target tracking method of any one of claims 1 to 8, further comprising:
predicting the appearance position of the target to be tracked in the next video frame in the video frame sequence according to the motion track of the target to be tracked in the video frame sequence.
10. An apparatus for tracking objects in video images, the apparatus comprising:
a frame splitting module: the video processing device is used for carrying out frame splitting processing on a video to obtain a video frame sequence;
a detection module: the target detection network model is used for detecting a target to be tracked in the video frame sequence to obtain a target area of the target to be tracked;
a coordinate positioning module: the system is used for carrying out coordinate positioning on a target area of the target to be tracked in a corresponding video frame sequence and recording coordinate position information;
the association calculation module: the target area of the target to be tracked in the current video frame is associated with the target area of the target to be tracked in the previous video frame based on the video frame sequence, and an associated calculation result is obtained;
a motion trajectory acquisition module: the method is used for acquiring the motion trail of the target to be tracked in the video frame sequence based on the correlation calculation result.
CN202010501281.4A 2020-06-04 2020-06-04 Target tracking method and device in video image Pending CN111932582A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010501281.4A CN111932582A (en) 2020-06-04 2020-06-04 Target tracking method and device in video image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010501281.4A CN111932582A (en) 2020-06-04 2020-06-04 Target tracking method and device in video image

Publications (1)

Publication Number Publication Date
CN111932582A true CN111932582A (en) 2020-11-13

Family

ID=73317763

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010501281.4A Pending CN111932582A (en) 2020-06-04 2020-06-04 Target tracking method and device in video image

Country Status (1)

Country Link
CN (1) CN111932582A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112418136A (en) * 2020-12-02 2021-02-26 云南电网有限责任公司电力科学研究院 Target area detection tracking method and device for field operating personnel
CN112907622A (en) * 2021-01-20 2021-06-04 厦门市七星通联科技有限公司 Method, device, equipment and storage medium for identifying track of target object in video
CN113343795A (en) * 2021-05-24 2021-09-03 广州智慧城市发展研究院 Target associated video tracking processing method and device
CN113849687A (en) * 2020-11-23 2021-12-28 阿里巴巴集团控股有限公司 Video processing method and device
CN114511591A (en) * 2021-12-31 2022-05-17 中国科学院自动化研究所 Trajectory tracking method and device, electronic equipment and storage medium

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113849687A (en) * 2020-11-23 2021-12-28 阿里巴巴集团控股有限公司 Video processing method and device
CN113849687B (en) * 2020-11-23 2022-10-28 阿里巴巴集团控股有限公司 Video processing method and device
CN112418136A (en) * 2020-12-02 2021-02-26 云南电网有限责任公司电力科学研究院 Target area detection tracking method and device for field operating personnel
CN112418136B (en) * 2020-12-02 2023-11-24 云南电网有限责任公司电力科学研究院 Method and device for detecting and tracking target area of field operator
CN112907622A (en) * 2021-01-20 2021-06-04 厦门市七星通联科技有限公司 Method, device, equipment and storage medium for identifying track of target object in video
CN113343795A (en) * 2021-05-24 2021-09-03 广州智慧城市发展研究院 Target associated video tracking processing method and device
CN113343795B (en) * 2021-05-24 2024-04-26 广州智慧城市发展研究院 Target associated video tracking processing method
CN114511591A (en) * 2021-12-31 2022-05-17 中国科学院自动化研究所 Trajectory tracking method and device, electronic equipment and storage medium
CN114511591B (en) * 2021-12-31 2023-08-04 中国科学院自动化研究所 Track tracking method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111932582A (en) Target tracking method and device in video image
CN111932579A (en) Method and device for adjusting equipment angle based on motion trail of tracked target
Tian et al. Scene Text Detection in Video by Learning Locally and Globally.
Molina-Moreno et al. Efficient scale-adaptive license plate detection system
CN107240130B (en) Remote sensing image registration method, device and system
Wang et al. An improved ORB image feature matching algorithm based on SURF
US11830218B2 (en) Visual-inertial localisation in an existing map
Son et al. A multi-vision sensor-based fast localization system with image matching for challenging outdoor environments
CN115375917B (en) Target edge feature extraction method, device, terminal and storage medium
CN113298146A (en) Image matching method, device, equipment and medium based on feature detection
CN111950370A (en) Dynamic environment offline visual milemeter expansion method
Chowdhary et al. Video surveillance for the crime detection using features
CN111105436B (en) Target tracking method, computer device and storage medium
CN114494373A (en) High-precision rail alignment method and system based on target detection and image registration
Gao et al. Occluded person re-identification based on feature fusion and sparse reconstruction
Guler et al. A new object tracking framework for interest point based feature extraction algorithms
Li et al. TextSLAM: Visual SLAM With Semantic Planar Text Features
CN113129332A (en) Method and apparatus for performing target object tracking
Jabnoun et al. Visual scene prediction for blind people based on object recognition
Sun et al. Convolutional neural network-based coarse initial position estimation of a monocular camera in large-scale 3D light detection and ranging maps
Li et al. A novel automatic image stitching algorithm for ceramic microscopic images
CN111401286A (en) Pedestrian retrieval method based on component weight generation network
Aswin et al. Stereo-Vision Based System For Object Detection And Recognition
Yang et al. Image copy–move forgery detection based on sped-up robust features descriptor and adaptive minimal–maximal suppression
Kalaiyarasi et al. Enhancing logo matching and recognition using local features

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20201113