CN111932582A

CN111932582A - Target tracking method and device in video image

Info

Publication number: CN111932582A
Application number: CN202010501281.4A
Authority: CN
Inventors: 詹瑾; 郑伟俊; 谢桂园
Original assignee: Guangdong Polytechnic Normal University
Current assignee: Guangdong Polytechnic Normal University
Priority date: 2020-06-04
Filing date: 2020-06-04
Publication date: 2020-11-13

Abstract

The invention discloses a target tracking method and a target tracking device in a video image, wherein the method comprises the following steps: performing frame splitting processing on a video to obtain a video frame sequence; detecting a target to be tracked in the video frame sequence based on a target detection network model to obtain a target area of the target to be tracked; carrying out coordinate positioning on a target area of the target to be tracked in a corresponding video frame sequence, and recording coordinate position information; performing correlation calculation on a target area of a target to be tracked in a current video frame and a target area of a target to be tracked in a previous video frame based on a video frame sequence to obtain a correlation calculation result; and acquiring the motion trail of the target to be tracked in the video frame sequence based on the correlation calculation result. In the specific implementation of the method, the target in the video image can be tracked, and the target can be accurately tracked, so that the requirements of users are met.

Description

Target tracking method and device in video image

Technical Field

The invention relates to the technical field of image processing, in particular to a target tracking method and device in a video image.

Background

The research and application of target tracking are an important branch of the computer vision field, and are widely applied to various safety monitoring technical fields, the existing tracking algorithm for the target in the video is too complex, the target needs to be subjected to complex tracking calculation in the tracking process, errors easily occur in the tracking process, the tracking precision of the target is not high easily, higher equipment needs to be configured to complete the tracking of the target, and the tracking cost is higher.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, and provides a method and a device for tracking a target in a video image, which can realize accurate tracking of the target in the video image and meet the requirements of users.

In order to solve the above technical problem, an embodiment of the present invention provides a method for tracking a target in a video image, where the method includes:

performing frame splitting processing on a video to obtain a video frame sequence;

detecting a target to be tracked in the video frame sequence based on a target detection network model to obtain a target area of the target to be tracked;

carrying out coordinate positioning on a target area of the target to be tracked in a corresponding video frame sequence, and recording coordinate position information;

performing correlation calculation on a target area of a target to be tracked in a current video frame and a target area of a target to be tracked in a previous video frame based on a video frame sequence to obtain a correlation calculation result;

and acquiring the motion trail of the target to be tracked in the video frame sequence based on the correlation calculation result.

Optionally, the video frame sequence is obtained by labeling according to a time sequence and removing redundant video frames.

Optionally, the target detection network model is a YOLOv3 network model;

the loss functions of the YOLOv3 network model include an object confidence loss function, an object class loss function, and an object localization loss function.

Optionally, before the target-detection-based network model detects the target to be tracked in the sequence of video frames, the method further includes:

and carrying out size normalization processing on the video frame sequence, and normalizing the picture size in the video frame sequence to 416 x 416.

Optionally, the performing coordinate positioning on the target region of the target to be tracked in the corresponding video frame sequence includes:

constructing pixel coordinates of the video frame sequence based on pixel points;

and acquiring the pixel coordinate position of the target area of the target to be tracked in the corresponding video frame sequence for coordinate positioning.

Optionally, the performing, based on the sequence of video frames, a correlation calculation between a target region of a target to be tracked in a current video frame and a target region of a target to be tracked in a previous video frame includes:

and performing similarity correlation calculation on a target region of a target to be tracked in the current video frame and a target region of a target to be tracked in the previous video frame based on the video frame sequence.

Optionally, the performing, based on the sequence of video frames, correlation calculation of similarity between a target region of a target to be tracked in a current video frame and a target region of a target to be tracked in a previous video frame includes:

and performing correlation calculation of SIFT feature vector similarity on a target region of a target to be tracked in the current video frame and a target region of a target to be tracked in the previous video frame based on the video frame sequence.

Optionally, the obtaining a motion trajectory of the target to be tracked in the video frame sequence based on the correlation calculation result includes:

judging whether the target area of the target to be tracked of the previous video frame appears in the target area of the target to be tracked in the current video frame or not according to the correlation calculation result;

when the judgment is carried out, respectively obtaining the coordinate positioning of the target area of the target to be tracked in the previous video frame and the target area of the target to be tracked in the current video frame;

and obtaining the motion trail of the target to be tracked in the video frame sequence according to the coordinate positioning of the target area of the target to be tracked in the previous video frame and the target area of the target to be tracked in the current video frame.

Optionally, the method further includes:

predicting the appearance position of the target to be tracked in the next video frame in the video frame sequence according to the motion track of the target to be tracked in the video frame sequence.

In addition, the embodiment of the present invention further provides an apparatus for tracking a target in a video image, the apparatus including:

a frame splitting module: the video processing device is used for carrying out frame splitting processing on a video to obtain a video frame sequence;

a detection module: the target detection network model is used for detecting a target to be tracked in the video frame sequence to obtain a target area of the target to be tracked;

a coordinate positioning module: the system is used for carrying out coordinate positioning on a target area of the target to be tracked in a corresponding video frame sequence and recording coordinate position information;

the association calculation module: the target area of the target to be tracked in the current video frame is associated with the target area of the target to be tracked in the previous video frame based on the video frame sequence, and an associated calculation result is obtained;

a motion trajectory acquisition module: the method is used for acquiring the motion trail of the target to be tracked in the video frame sequence based on the correlation calculation result.

In the embodiment of the invention, a video frame sequence is obtained by carrying out frame splitting processing on a video; detecting a target to be tracked in the video frame sequence based on a target detection network model to obtain a target area of the target to be tracked; carrying out coordinate positioning on a target area of the target to be tracked in a corresponding video frame sequence, and recording coordinate position information; performing correlation calculation on a target area of a target to be tracked in a current video frame and a target area of a target to be tracked in a previous video frame based on a video frame sequence to obtain a correlation calculation result; obtaining the motion trail of the target to be tracked in the video frame sequence based on the correlation calculation result; the target tracking method and the target tracking device can realize a simpler calculation process, can realize the target tracking without complex calculation, reduce errors in the calculation process, improve the tracking precision of the target, can perform tracking calculation on equipment with lower configuration, and reduce the tracking cost.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart of a method for tracking a target in a video image according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a target tracking apparatus in a video image according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Examples

Referring to fig. 1, fig. 1 is a schematic flowchart illustrating a method for tracking a target in a video image according to an embodiment of the present invention.

As shown in fig. 1, a method for tracking a target in a video image, the method comprising:

s11: performing frame splitting processing on a video to obtain a video frame sequence;

in the specific implementation process of the invention, the video frame sequence is obtained by labeling according to the time sequence and removing redundant video frames.

Specifically, after videos acquired by the video acquisition device are acquired, the videos need to be subjected to frame splitting, a video frame sequence is specifically marked according to a time sequence, and then the video frame sequence is obtained after relevant redundant frames are removed, so that the subsequent calculation redundancy is reduced, and the calculation amount is reduced.

S12: detecting a target to be tracked in the video frame sequence based on a target detection network model to obtain a target area of the target to be tracked;

in the specific implementation process of the invention, the target detection network model is a YOLOv3 network model; the loss functions of the YOLOv3 network model include an object confidence loss function, an object class loss function, and an object localization loss function.

Further, before the target detection network model detects the target to be tracked in the sequence of video frames, the method further includes: and carrying out size normalization processing on the video frame sequence, and normalizing the picture size in the video frame sequence to 416 x 416.

Specifically, the target detection network model is a YOLOv3 network model; the loss function of the YOLOv3 network model is as follows:

L(O,o,C,c,l,g)＝λ₁L_conf(o,c)+λ₂L_cla(O,C)+λ₃L_loc(l,g)；

wherein λ is₁，λ₂，λ₃Is the equilibrium coefficient; l is_conf(o, c) is a target confidence loss function; l is_cal(O, C) is a target class loss function; l is_loc(l, g) is the target localization loss function.

Wherein o is_iE {0,1}, which represents whether the target actually exists in the predicted target boundary box i, 0 represents absence, and 1 represents existence;

and (4) the Sigmoid probability of whether the target exists in the predicted target rectangular box i or not is shown.

Wherein, O_ijE {0,1}, which represents whether the jth class target really exists in the prediction target boundary box i, 0 represents nonexistence, and 1 represents existence;

and (4) representing the Sigmoid probability of the j-th class target in the predicted target rectangular frame i.

Wherein the target location loss function uses a sum of squares of the true bias value and the predicted bias value, wherein

Indicating the predicted rectangular box coordinate offset,

indicating the coordinate offset between the GTbox and default frame that matches it, (b)^x,b^y,b^w,b^h) Is a predicted target rectangular frame parameter; (c)^x,c^y,c^w,c^h) Default rectangular frame parameters; (g)^x,g^y,g^w,g^h) The matched real target rectangular frame parameters are obtained.

Before detecting the target to be tracked in the video frame sequence according to the standard detection network model, size normalization processing needs to be carried out on the video frame sequence, and in the invention, the picture size in the video frame sequence is normalized to 416 x 416.

By normalizing the picture size in the video frame sequence to 416 × 416, the video frame sequence more conforms to the input picture format size in the target detection network model, so that the target area of the target to be tracked is easier to detect in the video frame sequence.

S13: carrying out coordinate positioning on a target area of the target to be tracked in a corresponding video frame sequence, and recording coordinate position information;

in a specific implementation process of the present invention, the coordinate positioning of the target region of the target to be tracked in the corresponding video frame sequence includes: constructing pixel coordinates of the video frame sequence based on pixel points; and acquiring the pixel coordinate position of the target area of the target to be tracked in the corresponding video frame sequence for coordinate positioning.

Specifically, pixel points in an image of each frame in the video frame sequence are obtained, pixel coordinates of the image of each frame are constructed through the pixel points, and corresponding coordinate positions are obtained according to the pixel coordinates in the positions of the pixel coordinates of the target area of the target to be tracked in the image of the corresponding frame of the corresponding video frame sequence; by the method, the position of the pixel coordinate of the target area of the target to be tracked in the image of the corresponding frame of the corresponding video frame sequence can be accurately determined.

S14: performing correlation calculation on a target area of a target to be tracked in a current video frame and a target area of a target to be tracked in a previous video frame based on a video frame sequence to obtain a correlation calculation result;

in a specific implementation process of the present invention, the performing correlation calculation on a target region of a target to be tracked in a current video frame and a target region of a target to be tracked in a previous video frame based on a video frame sequence includes: and performing similarity correlation calculation on a target region of a target to be tracked in the current video frame and a target region of a target to be tracked in the previous video frame based on the video frame sequence.

Further, the performing, based on the sequence of video frames, correlation calculation of similarity between a target region of a target to be tracked in a current video frame and a target region of a target to be tracked in a previous video frame includes: and performing correlation calculation of SIFT feature vector similarity on a target region of a target to be tracked in the current video frame and a target region of a target to be tracked in the previous video frame based on the video frame sequence.

Specifically, according to a video frame sequence, performing correlation calculation of similarity between a target region of a target to be tracked in a current video frame and a target region of a target to be tracked in a previous video frame, and then obtaining a correlation calculation result; further adopting correlation calculation of SIFT feature vector similarity; the SIFT feature is based on some locally apparent points of interest on the object, regardless of the size and rotation of the image. The tolerance to light, noise, and slight viewing angle changes is also quite high.

The specific steps of correlation calculation of SIFT feature vector similarity are as follows: and (3) detection of extreme values in the scale space: the image locations are searched for on all scales. Identifying potential interest points invariant to scale and rotation by a gaussian differential function; key point positioning: at each candidate location, the location and scale are determined by fitting a fine model. The selection of key points depends on their degree of stability; direction determination: one or more directions are assigned to each keypoint location based on the local gradient direction of the image. All subsequent operations on the image data are transformed with respect to the orientation, scale and location of the keypoints, thereby providing invariance to these transformations; description of key points: local gradients of the image are measured at a selected scale in a neighborhood around each keypoint. These gradients are transformed into a representation that allows for relatively large local shape deformations and illumination variations.

The SIFT algorithm is characterized in that: the SIFT features are local features of the image, which keep invariance to rotation, scale scaling and brightness change and also keep a certain degree of stability to view angle change, affine transformation and noise; the uniqueness (distingness) is good, the information content is rich, and the method is suitable for quick and accurate matching in a massive characteristic database; the multiplicity, even a few objects can generate a large number of SIFT feature vectors; high speed, the optimized SIFT matching algorithm can even meet the real-time requirement; and the expandability can be conveniently combined with the feature vectors in other forms.

S15: and acquiring the motion trail of the target to be tracked in the video frame sequence based on the correlation calculation result.

In a specific implementation process of the present invention, the obtaining a motion trajectory of a target to be tracked in the sequence of video frames based on the correlation calculation result includes: judging whether the target area of the target to be tracked of the previous video frame appears in the target area of the target to be tracked in the current video frame or not according to the correlation calculation result; when the judgment is carried out, respectively obtaining the coordinate positioning of the target area of the target to be tracked in the previous video frame and the target area of the target to be tracked in the current video frame; and obtaining the motion trail of the target to be tracked in the video frame sequence according to the coordinate positioning of the target area of the target to be tracked in the previous video frame and the target area of the target to be tracked in the current video frame.

Specifically, whether a target area of a target to be tracked of a previous video frame appears in a target area of a target to be tracked in a current video frame is judged according to the correlation calculation result; when the judgment is carried out, respectively obtaining the coordinate positioning of the target area of the target to be tracked in the previous video frame and the target area of the target to be tracked in the current video frame; and obtaining the motion trail of the target to be tracked in the video frame sequence according to the coordinate positioning of the target area of the target to be tracked in the previous video frame and the target area of the target to be tracked in the current video frame.

In the specific implementation process of the invention, the method further comprises the following steps: predicting the appearance position of the target to be tracked in the next video frame in the video frame sequence according to the motion track of the target to be tracked in the video frame sequence.

Specifically, the occurrence position of the target to be tracked in the next video frame in the video frame sequence is predicted according to the motion track of the target to be tracked in the video frame sequence, so that the accurate position of the target to be tracked can be accurately predicted.

Examples

Referring to fig. 2, fig. 2 is a schematic structural diagram of a target tracking device in a video image according to an embodiment of the present invention.

As shown in fig. 2, an apparatus for tracking an object in a video image, the apparatus comprising:

the frame splitting module 21: the video processing device is used for carrying out frame splitting processing on a video to obtain a video frame sequence;

The detection module 22: the target detection network model is used for detecting a target to be tracked in the video frame sequence to obtain a target area of the target to be tracked;

L(O,o,C,c,l,g)＝λ₁L_conf(o,c)+λ₂L_cla(O,C)+λ₃L_loc(l,g)；

wherein λ is₁，λ₂，λ₃Is the equilibrium coefficient; l is_conf(o, c) is a target confidence loss function；L_cla(O, C) is a target class loss function; l is_loc(l, g) is the target localization loss function.

Indicating the predicted rectangular box coordinate offset,

The coordinate positioning module 23: the system is used for carrying out coordinate positioning on a target area of the target to be tracked in a corresponding video frame sequence and recording coordinate position information;

The association calculation module 24: the target area of the target to be tracked in the current video frame is associated with the target area of the target to be tracked in the previous video frame based on the video frame sequence, and an associated calculation result is obtained;

The motion trajectory acquisition module 25: the method is used for acquiring the motion trail of the target to be tracked in the video frame sequence based on the correlation calculation result.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable storage medium, and the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.

In addition, the above detailed description is provided for the method and apparatus for tracking a target in a video image according to the embodiments of the present invention, and a specific example should be used herein to explain the principle and the implementation manner of the present invention, and the description of the above embodiment is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method for tracking a target in a video image, the method comprising:

2. The object tracking method according to claim 1, wherein the sequence of video frames is obtained by labeling in chronological order and removing redundant video frames.

3. The target tracking method of claim 1, wherein the target detection network model is a YOLOv3 network model;

4. The target tracking method of claim 1, wherein before the target detection network model detects the target to be tracked in the video frame sequence, the method further comprises:

5. The target tracking method according to claim 1, wherein the coordinate locating of the target region of the target to be tracked in the corresponding video frame sequence comprises:

6. The target tracking method according to claim 1, wherein the performing the correlation calculation on the target area of the target to be tracked in the current video frame and the target area of the target to be tracked in the previous video frame based on the video frame sequence comprises:

7. The target tracking method according to claim 6, wherein the performing the correlation calculation of the similarity between the target area of the target to be tracked in the current video frame and the target area of the target to be tracked in the previous video frame based on the video frame sequence comprises:

8. The target tracking method according to claim 1, wherein the obtaining of the motion trajectory of the target to be tracked in the video frame sequence based on the correlation calculation result comprises:

9. The target tracking method of any one of claims 1 to 8, further comprising:

10. An apparatus for tracking objects in video images, the apparatus comprising: