CN115482483A

CN115482483A - Traffic video target tracking device, method and storage medium

Info

Publication number: CN115482483A
Application number: CN202210930357.4A
Authority: CN
Inventors: 郑艺扬; 陈宁; 马嵩; 姜文武; 吴益君; 顾孙佳
Original assignee: Zhejiang Gaoxin Technology Co Ltd
Current assignee: Zhejiang Gaoxin Technology Co Ltd
Priority date: 2022-08-03
Filing date: 2022-08-03
Publication date: 2022-12-16

Abstract

The invention relates to the field of target detection and tracking, in particular to a traffic video target tracking device, a method and a storage medium, which mainly comprise the following steps: 1) Installing a camera and a data processing device; 2) Starting a target tracking program, reading data streams, and processing data frame by frame; 3) When a first frame image is read, starting a target detector, and initializing parameters of a target tracker; 4) Thereafter, the target detection model is used every 5 frames, and the lightweight target tracking model is used for the rest of the time. The invention fills the gap between two detections by using the tracking with lower calculated amount, so that the total calculated amount is reduced, the loss of too much precision is avoided, the slight shake of the picture background can be responded, in addition, the two times of correlation operation can also correlate the lower detection caused by the abnormity of shading or reflection, and the like, and the robustness of the tracking is enhanced.

Description

Traffic video target tracking device, method and storage medium

Technical Field

The invention relates to the field of target detection and tracking, in particular to a traffic video target tracking device, a method and a storage medium.

Background

With the development of the times, intelligent traffic and intelligent roads have become hot topics in recent years, and vehicles have become essential tools for people to participate in traffic due to the continuous improvement of the living standard of the nation. However, with the increase of vehicles, the problems of road congestion, environmental pollution and the like are more serious, and the development of intelligent highways needs an all-weather, all-time, full-function, full-coverage and highly-integrated road-side sensing system to realize scene services such as flow control, accident detection and prevention, vehicle-road cooperation and the like. Through the camera and the matched edge calculation unit, the position, color, model, traffic flow and other information of the vehicle can be detected, all target tracks can be tracked in real time, and therefore multi-target information such as traffic flow and the like can be collected and the emergency detection of the main road and the ramp can be realized. The structured data obtained by the system processing can be uploaded to an information processing and analyzing system for flow control, and can also be reported to an expressway accident early warning system to support relevant personnel to carry out research decision and event analysis, implement risk prevention and control and accident handling, and provide the same-row capacity of the expressway; meanwhile, rich data support can be provided for implementation of future vehicle-road cooperation scenes.

With the development of deep learning, a plurality of target tracking methods based on learning emerge, and the method is mainly divided into 4 main steps: 1) Modeling a new object in each video frame; 2) Detecting an object according to the established model; 3) Repeatedly searching a target object in subsequent frames for target tracking; 4) And acquiring a series of object tracks to realize target tracking. These operations often depend on high-performance GPU resources, high-quality detection and tracking models run slower on edge devices, and monitoring cameras sometimes shake, resulting in tracking failure.

Disclosure of Invention

In order to make up for the defects of the prior art, the invention provides a traffic video target tracking device, a method and a storage medium, which aim to solve the problem that a high-quality detection and tracking model in the prior art runs at a low speed on edge equipment.

The invention provides a traffic video target tracking device for solving the technical problems, which comprises an image acquisition module and a data processing device; the data processing device comprises a target detection module and a target tracking module; wherein:

the image acquisition module is used for acquiring image data;

furthermore, the image acquisition module can be any equipment capable of acquiring image data, such as a USB camera, a web camera, a binocular camera, a depth camera, a CSI camera and the like;

the target detection module is used for carrying out target detection and classification on each frame of image data;

and the target tracking module is used for tracking the target detected in each frame of image data.

As shown in fig. 1, the present invention provides a traffic video target tracking method, which includes the following steps:

10, installing a camera and a data processing device;

starting a target tracking program, reading data streams, and processing data frame by frame;

30, starting a target detector and initializing parameters of a target tracker when reading the first frame of image;

further, for the target detector, detecting a target object appearing in the first frame image, recording position information of the target, and selecting a frame of the target object;

initializing target tracker parameters, and adjusting the size of an image frame to a required size;

and 40, using a target detection model in the next frame, and using a light-weight target tracking model in the rest time:

optionally, as shown in fig. 2, the step 40 specifically includes the following steps:

s1, using target detection and tracking for the image of the mth frame, and using target tracking only for other frames, specifically:

s1.1, when the program traverses the mth frame, the next step is carried out; otherwise, entering S2;

s1.2, asynchronously starting target detection;

s1.3, classifying the detection targets, and dividing according to the types and colors of the vehicles;

s1.4, re-identification detection pretreatment, extracting feature embedding from a detection frame;

s1.5, re-identifying, detecting and post-processing, then classifying targets, and dividing according to lane line indexes, vehicle types and colors;

s2, predicting according to a filter of the existing track to obtain the initial position of the target frame of the existing track at the moment, and predicting by using the following calculation formula:

size＝max(mean[2:4]-mean[:2]+1)

std＝std_acc _factor *std_acc _offset

cov _motion ＝acc _cov *std ²

mean＝mat _trans @mean

cov＝mat _trans @cov@mat _trans .T+cov _motion

cov＝0.5*(cov+cov.T)

where @ denotes matrix multiplication;

further, the filter may be a bayesian filter, a particle filter, a kalman filter, an unscented kalman filter, an extended kalman filter, a square root kalman filter, a volumetric kalman filter, a quaternion kalman filter, a volumetric integration kalman filter, a dual kalman filter, an adaptive kalman filter, and the like.

S3, calculating the optical flows of the front frame and the rear frame, dividing the optical flows into a foreground and a background, and removing the foreground when calculating the background:

wherein, the foreground represents the target object, and the specific step of calculating the optical flow is as follows:

s3.1, preprocessing frames, converting frame images into a gray image, reconstructing image size in order to improve the running speed, and reconstructing the image size into a low-resolution image used by optical flow calculation;

s3.2, arranging and tracking in a sequence from near to far;

s3.3, traversing the existing tracks, eliminating the feature points which are not in the prediction frame at the current moment, and detecting the feature points in the corresponding position range of the current frame again when the number of the feature points is lower than a threshold value;

further, the feature point detection mode is FAST, which is a feature point detection mode with balanced speed and precision, and the specific detection steps are as follows:

1) Selecting a pixel p from the image, and setting its brightness value to I _p ；

2) Setting a proper threshold value t;

3) Considering a discretized Bresenham circle centered at the pixel point with a radius equal to 3 pixels, then there are 16 pixels on the boundary of this circle;

4) If there are n consecutive pixels on the 16-pixel circle, their pixel values are all compared to I _p + t is greater than or equal to I _p -t is small, it is a feature point, where the value of n can be set to 12 or 9;

s3.4, carrying out batch processing on the feature points, removing the foreground from the image in a mask mode, and detecting the feature points belonging to the background in the rest image;

s3.5, matching the feature points of the current moment and the previous moment by using an optical flow method;

further, the optical flow method specifically includes:

1) Assuming that the image is time-varying, the image can be viewed as a function of time I (t), a pixel located at (x, y) at time t, and the gray scale can be described as: i (x, y, t);

2) Assuming that the gray level of a pixel at the same spatial point is fixed and constant in each image, at the time t + dt, the gray level of the pixel is: i (x + dx, y + dy, t + dt) = I (x, y, t);

3) Taylor expansion is performed on the left expression, and a first order term is retained, then:

4) Since it is assumed that the gray scale is unchanged, the gray scale at the next moment is equal to the previous gray scale, and there are:

5) Both sides are divided by dt to give:

wherein, the first and the second end of the pipe are connected with each other,

is the moving speed of the pixel point on the x-axis,

is the y-axis velocity, denoted u, v, respectively, and, at the same time,

is the gradient of the image in the x-direction at that point,

are gradients in the y direction, respectively denoted as I _x ,I _y The time variation of the image gradation is represented as I _t Written in matrix form, then there are:

6) Assuming that the pixels in a window have the same motion, a window of size w x w contains w ² A pixel, since the pixels in the window have the same motion, then w is obtained ² The following equations:

recording:

the whole equation is then:

7) An over-definite equation about u, v is obtained, and the motion speed u, v of the pixel between the images can be obtained through least square solution.

S3.6, estimating a homography matrix of the camera motion between the front frame and the rear frame by using the matched background characteristic point pairs;

further, the estimated homography matrix takes the background feature point pairs matched with the previous and next frames as input, a registration method is selected, the homography matrices of the images of the previous and next frames are calculated, and the background feature point pairs matched with the previous and next frames meet the following relations:

wherein H represents a homography matrix, (x) _i ,y _i ) And (x' _i ,y′ _i ) Representing characteristic point pairs matched with the previous and next frames;

optionally, the registration method includes least square method, random sample consistency, minimum median and asymptotic sample consistency.

S4, uniformly correcting the positions and the sizes of all targets by using the homography matrix of the camera motion between the front frame and the back frame, and then updating the positions and the sizes of all targets at the current time by using the foreground characteristic point pairs matched with all targets at the current time and the previous time;

s5, data association is carried out:

further, the data association specifically includes:

s5.1, firstly, once associating the higher-score detection frame with all tracked prediction frames;

s5.2, the lower score detection frame is associated with the unmatched tracking of the previous round again;

further, all target detection result sets D at the current moment are collected according to a high threshold value T _high And a low threshold value T _low Is divided into three parts; confidence level not lower than high threshold T _high The detection result of (2) is attributed to the higher-score detection box set D _high (ii) a Confidence below high threshold T _high But not lower than the low threshold T _low The detection result of (2) is attributed to the lower-score detection box set D _low (ii) a The rest of the process was not repeated.

Further, the association of the detection frame with the tracking frame is based on an IoU method.

S6, updating by using a filter according to the observation obtained by the correlation;

s7, if the confidence coefficient of the detection frame which is not matched in the steps S5.1 and S5.2 is not lower than a preset threshold value, establishing a new temporary tracking based on the detection frame; for the existing temporary tracking, if the matching is obtained for three times continuously, the tracking is judged to be effective; for a track that is not matched, if the number of times it is not matched is above a given threshold, it is marked as lost.

The invention also provides a traffic video target tracking storage medium, which stores a traffic video target tracking method program, wherein the traffic video target tracking method program realizes the steps of the traffic video target tracking method when being executed by the image acquisition module and the data processing device. The invention has the beneficial effects that:

1. according to the traffic video target tracking device, the method and the storage medium, the gap between two detections is filled by using the tracking with lower calculation amount, so that the total calculation amount is reduced, but the accuracy is not lost too much.

2. The traffic video target tracking device, the method and the storage medium can respond to slight shake of the background of the picture.

3. According to the traffic video target tracking device, the method and the storage medium, the lower-grade detection caused by the abnormity of shielding or reflection can be correlated through two correlation operations, and the tracking robustness is enhanced.

Drawings

The invention is further described below with reference to the drawings and the embodiments.

FIG. 1 is a schematic flow chart illustrating the overall steps of the traffic video target tracking method according to the present invention;

FIG. 2 is a flowchart illustrating an embodiment of a traffic video target tracking method according to the present invention

Detailed Description

The above prior art solutions have drawbacks that are the results of practical and careful study, and therefore, the discovery process of the above problems and the solutions proposed by the following embodiments of the present application to the above problems should be the contributions of the applicant to the present application in the course of the present application.

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present application, as presented in the figures, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The present invention will be further described with reference to the following detailed description so that the technical means, the creation features, the achievement purposes and the effects of the present invention can be easily understood.

The invention relates to a traffic video target tracking device, which comprises an image acquisition module 1 and a data processing device 2; the data processing device comprises a target detection module 3 and a target tracking module 4; wherein the content of the first and second substances,

the image acquisition module 1 is used for acquiring image data;

further, in this embodiment, the image capturing module 1 may be any device capable of capturing image data, such as a USB camera, a webcam, a binocular camera, a depth camera, and a CSI camera;

the target detection module 3 is configured to perform target detection and classification on each frame of image data;

and the target tracking module 4 is used for tracking the target detected in each frame of image data.

10, installing a camera and a data processing device;

starting a target tracking program, reading data stream, and processing data frame by frame;

30, starting a target detector and initializing parameters of a target tracker when reading the first frame image;

for the target detector, detecting a target object appearing in the first frame image, recording position information of the target, and selecting the frame;

and 40, using a target detection model every 5 frames, and using a lightweight target tracking model in the rest time:

s1.2, asynchronously starting target detection;

s1.4, re-identification detection pretreatment, extracting features from a detection frame and embedding the features;

further wherein m represents an integer multiple of 5;

size＝max(mean[2:4]-mean[:2]+1)

std＝std_acc _factor *std_acc _offset

cov _motion ＝acc _cov *std ²

mean＝mat _trans @mean

cov＝mat _trans @cov@mat _trans .T+cov _motion

cov＝0.5*(cov+cov.T)

where @ represents matrix multiplication;

s3.1, preprocessing frames, converting frame images into a gray image, reconstructing image size for improving the running speed, and reconstructing the image size into a low-resolution image used for optical flow calculation;

s3.2, arranging and tracking in a sequence from near to far;

2) Setting a proper threshold value t;

3) Considering a discretized Bresenham circle with a radius equal to 3 pixels centered on the pixel point, then there are 16 pixels on the boundary of this circle;

4) If there are n consecutive pixels on the 16-pixel circle, their pixel values are all compared to I _p + t is greater than or equal to I _p T is small, it is a feature point, where the value of n can be set to 12 or 9;

further, the optical flow method specifically includes:

2) Assuming that the gray value of a pixel at the same spatial point is fixed and constant in each image, at time t + dt, the gray value of the pixel is: i (x + dx, y + dy, t + dt) = I (x, y, t);

3) Taylor expansion is performed on the left expression, and a first order term is retained, so that:

4) Since it is assumed that the gray scale is unchanged, the gray scale at the next moment is equal to the previous gray scale, and then:

5) Both sides are divided by dt to give:

is the moving speed of the pixel point on the x-axis,

the y-axis velocity, denoted u, v, respectively, and, at the same time,

is the gradient of the image in the x-direction at that point,

are gradients in the y direction, respectively denoted as I _x ,I _y The time variation of the image gradation is represented as I _t Written in matrix form, then:

6) Suppose a certain windowThe pixels in the mouth have the same motion, and a window of size w x w, containing w ² A pixel, since the pixels in the window have the same motion, then w is obtained ² The following equations:

recording:

the whole equation is then:

further, the estimated homography matrix takes background feature point pairs matched with the previous and next frames as input, a registration method is selected, homography matrices of images of the previous and next frames are calculated, and the background feature point pairs matched with the previous and next frames meet the following relations:

S4, uniformly correcting the positions and the sizes of all targets by using a homography matrix of the camera motion between the front frame and the rear frame, and then updating the positions and the sizes of all targets at the current time by using foreground characteristic point pairs matched with the current time and the previous time of all targets;

s5, data association is carried out:

wherein the data association specifically comprises:

s5.1, firstly, the higher score detection frame is associated with all the tracked prediction frames once;

further, all target detection result sets D at the current moment are collected according to a high threshold value T _high And a low threshold value T _low Is divided into three parts; confidence level is not lower than high threshold T _high The detection result of (2) is attributed to the higher-score detection box set D _high (ii) a Confidence below high threshold T _high But not lower than the low threshold T _low The detection result of (2) is attributed to the lower-score detection box set D _low (ii) a The rest of the process was not repeated.

s7, for the detection frame which is not matched in the steps S5.1 and S5.2, if the confidence coefficient is not lower than a preset threshold value, establishing a new temporary tracking based on the detection frame; for the existing temporary tracking, if the matching is obtained for three times continuously, the tracking is judged to be effective; for a trace that is not matched, if its number of times of non-matching is above a given threshold, it is marked as lost.

The invention also provides a traffic video target tracking storage medium, wherein a traffic video target tracking method program is stored on the traffic video target tracking storage medium, and the steps of the traffic video target tracking method are realized when the traffic video target tracking method program is executed by the image acquisition module 1 and the data processing device 2.

The foregoing shows and describes the general principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are given by way of illustration of the principles of the present invention, and that various changes and modifications may be made without departing from the spirit and scope of the invention as defined by the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. A traffic video target tracking device is characterized by comprising an image acquisition module, a target detection module and a target tracking module; wherein the content of the first and second substances,

the image acquisition module is used for acquiring image data;

2. A traffic video target tracking method is characterized by comprising the following steps:

installing a camera and a data processing device;

when a first frame image is read, starting a target detector, and initializing parameters of a target tracker;

the next frame uses a target detection model, and the rest of the time uses a lightweight target tracking model.

3. The traffic video target tracking method according to claim 2, wherein the target detector is started when the first frame image is read, and initializing the target tracker parameters comprises:

and initializing target tracker parameters and adjusting the size of the image frame to a required size.

4. The traffic video target tracking method according to claim 2, wherein the step of using the target detection model every 5 frames and using the lightweight target tracking model in the rest time is as follows:

s1, target detection and tracking are used for the mth frame of image, and only target tracking is used for other frames;

s2, predicting according to a filter of the existing track to obtain the initial position of the target frame of the existing track at the moment;

s3, calculating the optical flows of the front frame and the rear frame, dividing the optical flows into a foreground and a background, and removing the foreground when calculating the background;

s5, data association is carried out;

s7, establishing new tracking for the unmatched high score detection frame, and marking the unmatched tracking in a lost state for a long time;

where m represents an integer multiple of 5 and the foreground represents the target.

5. The traffic video target tracking method according to claim 4, wherein the specific step of calculating the optical flow is:

s3.2, arranging and tracking in the sequence from near to far;

s3.4, carrying out batch processing on the feature points, removing the foreground from the image in a mask form, and detecting the feature points belonging to the background in the rest image;

and S3.6, estimating a homography matrix of the camera motion between the front frame and the rear frame by using the matched background characteristic point pairs.

6. The traffic video target tracking method according to claim 4, wherein the data association is specifically:

and S5.2, the lower score detection frame is associated with the unmatched tracking of the previous round again.

7. A traffic video target tracking storage medium, characterized in that the traffic video target tracking storage medium has stored thereon a traffic video target tracking method program, which when executed by an image acquisition module, a target detection module and a target tracking module, implements the steps of the traffic video target tracking method according to any one of claims 2 to 6.