CN116777950B

CN116777950B - Multi-target visual tracking method, device, equipment and medium based on camera parameters

Info

Publication number: CN116777950B
Application number: CN202310418468.1A
Authority: CN
Inventors: 易可夫; 罗凯; 郝威
Original assignee: Changsha University of Science and Technology
Current assignee: Changsha University of Science and Technology
Priority date: 2023-04-19
Filing date: 2023-04-19
Publication date: 2024-05-03
Anticipated expiration: 2043-04-19
Also published as: CN116777950A

Abstract

The invention discloses a multi-target visual tracking method, device, equipment and medium based on camera parameters. The method comprises the following steps: acquiring video stream data, and acquiring an image position of a detection target through a target detection model; selecting any frame of video image from the video stream data, and carrying out camera parameter evaluation; mapping the image position to a geographic position according to the estimated camera parameters, and analyzing the position uncertainty; generating a target track through a joint probability data association tracking algorithm according to the geographic position and the position uncertainty; and acquiring an association cost matrix according to the target detection confidence, the geographic position and the track matching information, and associating the effective detection frame with the existing track through a Hungary algorithm so as to execute a visual tracking task. The invention improves the robustness of visual multi-target tracking and the multi-target tracking precision, and simultaneously solves the problem of identity switching when shielding and overlapping in the multi-target visual tracking.

Description

Multi-target visual tracking method, device, equipment and medium based on camera parameters

Technical Field

The present invention relates to the field of target detection tracking technologies, and in particular, to a method, an apparatus, a device, and a medium for multi-target visual tracking based on camera parameters.

Background

Target tracking refers to predicting the position of a target at each moment in the video given the initial position of the target in the video. Object tracking is an important problem in computer vision, and is usually the first step in video analysis processing, so a great number of students have conducted studies on object tracking, and a number of effective object tracking methods have been proposed. In some monitoring scenarios, multiple objects need to be tracked simultaneously in a complex environment, and mutual occlusion between multiple objects increases the difficulty of object tracking, which often occurs in pedestrian tracking. When pedestrians are simultaneously present in the image of the image pickup apparatus, the pedestrians overlap each other, and the staggered movement makes it impossible to accurately acquire the true position. The current multi-target visual tracking method is mainly divided into the following two types of methods:

1) Camera-based one-stage target tracking method

The method outputs an object detection frame and appearance characteristics of an object in a target detection network, and then utilizes motion information and the appearance characteristics of the object to construct an incidence matrix of the detection frame and a track. The method only needs to perform the deep neural network calculation once, so that the method has the advantages of high efficiency and short time, but has the problems of reduced object detection precision and insufficient appearance characteristic representation capability. Thus, the one-stage object tracking method has a large limitation in that tracking is often disturbed and lost when the object moves rapidly and has a similar appearance.

2) Two-stage target tracking method based on camera

According to the method, an object detection frame is firstly obtained through a target detection network, and then feature extraction is carried out through a re-identification network. According to the method, only secondary deep neural network calculation is needed, object detection precision and appearance characteristic representation capability can be improved, but more memory and calculation resources are needed. And then, constructing an incidence matrix of the detection frame and the track by using the motion information and the appearance characteristics of the object, substituting the incidence matrix into a Hungary algorithm, and acquiring the incidence relation of the detection frame and the track. In the method, the conditions of shielding, similar appearance and nonlinear movement are still difficult to solve, so that the conditions of track fragmentation, track disturbance, track loss, track error and the like are caused.

Therefore, the existing multi-target visual tracking method mainly considers the motion information and appearance characteristics of targets in images, and is difficult to effectively track the targets under the condition of irregular movement and similar appearance of the targets. In addition, in the current visual tracking method, because camera parameters are difficult to acquire, the utilization of geographic position information of an object is less, meanwhile, the utilization of clutter in a detector in target tracking is ignored, clutter information is not fully utilized, and therefore the problem of poor robustness of visual target tracking is caused.

In addition, the existing camera calibration method needs a calibration object, has complex algorithm, and needs to be calibrated again when the camera moves in position, so that the calibration cost is high and the reusability is low.

Disclosure of Invention

Based on the above, the embodiment of the invention provides a multi-target visual tracking method, device, equipment and medium based on camera parameters, so as to solve the problems that the existing visual target tracking method is difficult to effectively track targets and has poor robustness.

To achieve the above object, in a first aspect, an embodiment of the present invention provides a multi-target visual tracking method based on camera parameters, including:

Acquiring video stream data, and acquiring an image position of a detection target through a target detection model;

Selecting any frame of video image from the video stream data, and carrying out camera parameter evaluation;

Mapping the image position of the detection target to a geographic position according to the estimated camera parameters, and analyzing the position uncertainty;

generating a target track of a detection target through a joint probability data association tracking algorithm according to the geographic position and the position uncertainty;

Acquiring an association cost matrix between a currently captured detection target and a tracking target according to the target detection confidence, the geographic position and the track matching information;

And according to the association cost matrix, associating the effective detection frame with the existing track through the Hungary algorithm so as to execute the visual tracking task.

In a second aspect, an embodiment of the present invention provides a multi-target visual tracking apparatus based on camera parameters, including:

the detection target detection module is used for acquiring video stream data and acquiring the image position of a detection target through a target detection model;

the camera parameter evaluation module is used for selecting any frame of video image from the video stream data and performing camera parameter evaluation;

a position uncertainty obtaining module, configured to map an image position of the detection target to a geographic position according to the estimated camera parameter, and analyze a position uncertainty;

The track generation module is used for generating a target track of the detection target through a joint probability data association tracking algorithm according to the geographic position and the position uncertainty;

The association cost matrix acquisition module is used for acquiring an association cost matrix between a currently captured detection target and a tracking target according to the target detection confidence level, the geographic position and the track matching information;

And the visual tracking module is used for associating the effective detection frame with the existing track through the Hungary algorithm according to the association cost matrix so as to execute a visual tracking task.

In a third aspect, an embodiment of the present invention provides a computer device, including a memory, a processor, and a multi-target vision tracking program stored in the memory and executable on the processor, the processor implementing the camera parameter-based multi-target vision tracking method as described in the first aspect when executing the multi-target vision tracking program.

In a fourth aspect, an embodiment of the present invention provides a computer readable storage medium storing a multi-target vision tracking program, which when executed by a processor implements the multi-target vision tracking method based on camera parameters according to the first aspect.

According to the multi-target visual tracking method, device, equipment and medium based on the camera parameters, firstly, the camera parameters are evaluated through the video images selected from the video stream data, so that the scale effect can be eliminated; then mapping the image position of the detected target to a geographic position through the estimated camera parameters, analyzing the position uncertainty, and fully utilizing the position information of the target to target tracking; then, a target track of a detection target is generated through a joint probability data association tracking algorithm, and the detected clutter information can be fully utilized to target tracking; finally, according to the target detection confidence, the geographic position and the track matching information, an association cost matrix between the detection target and the tracking target is obtained, and an effective detection frame and the existing track are associated through a Hungary algorithm, so that the robustness of visual multi-target tracking and the multi-target tracking precision are improved, and meanwhile, the problem of identity switching during shielding overlapping in multi-target visual tracking is solved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic view of an application environment of a multi-objective visual tracking method based on camera parameters according to an embodiment of the invention;

FIG. 2 is a flow chart of a multi-objective visual tracking method based on camera parameters according to an embodiment of the invention;

Fig. 3 is a schematic structural diagram of a multi-objective visual tracking device based on camera parameters according to an embodiment of the invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

The multi-target visual tracking method based on the camera parameters can be applied to an application environment as shown in fig. 1, wherein a camera end communicates with the camera end through a network. The camera end may be, but not limited to, an intelligent terminal (such as a smart phone, a personal computer, a notebook computer, etc.) with a camera system, and the camera end may also be a camera. The camera end can be realized by a stand-alone server or a server cluster formed by a plurality of servers.

In one embodiment, as shown in fig. 2, a multi-target visual tracking method based on camera parameters is provided, and the method is applied to the camera end in fig. 1 for illustration, and includes the following steps:

Step S10, obtaining video stream data, and obtaining the image position of a detection target through a target detection model.

In step S10, the network structure of the target detection model is a common deep neural network, such as a convolutional neural network, a recurrent neural network, and the like.

Specifically, video stream data are collected through a camera system or a camera, each frame of video image in the video stream data is input into a pre-trained target detection model, the video image is identified through the target detection model, a plurality of detection targets in the video image and image positions (u, v) of each detection target are obtained, and a coordinate system corresponding to the image positions is a direct coordinate system constructed by taking the upper left corner of the video image as an origin and taking pixels as units.

Step S20, selecting any frame of video image from the video stream data, and carrying out camera parameter evaluation.

Specifically, any frame of video image in the video stream data is acquired, and the camera parameter evaluation is performed by adopting a manual adjustment mode.

Preferably, the step S20 of evaluating camera parameters includes the steps of:

step S201, initializing a camera to set camera parameters as default values; the camera parameters comprise camera lateral movement, height, distance, pitching, rolling, heading and focal length;

step S202, constructing n multiplied by n image grids;

step S203, mapping the image grid to the selected video image according to the initialized camera;

Step S204, adjusting the default value of the camera parameter to enable the image grid to be tiled on the ground in the video image, so as to obtain a preliminary camera parameter;

Step S205, obtaining the image position of the detection target in the video image through a target detector, and obtaining the detection target height by combining the preliminary camera parameters;

Step S206, constructing a camera parameter optimization function according to the detected target height, and obtaining optimized camera parameters through the camera parameter optimization function; wherein, the camera parameter optimization function is:

Wherein, A camera parameter matrix corresponding to the optimized camera parameters; argmin (·) is the variable value at which the camera parameter optimization function takes the minimum value; n is the number of detection targets; f is the number of frames of the video data stream; sigma (·) is the standard deviation function; alpha is a super parameter optimized by a function; e (·) is the mean function; /(I)The height vector of the ith detection target in the F frame video image can be expressed as/> Height vectors for N detection targets in the f-th frame video image can be expressed as/> For detecting the target height average.

It can be appreciated that in this embodiment, the preliminary camera parameters are obtained by manual adjustment, and the camera parameters are optimized by the camera parameter optimization function, so that accurate evaluation of the camera parameters can be achieved. In addition, compared with the existing camera calibration method, the method is low in cost and high in reusability.

And step S30, mapping the image position of the detection target to a geographic position according to the estimated camera parameters, and analyzing the position uncertainty.

Specifically, the image position (u, v) of the detection target is mapped to the geographic position (x, y, z) according to the camera parameters estimated and obtained in step S20, and the position uncertainty is analyzed. Where (X, Y, Z) represents the length of the detection target along the X-axis, Y-axis, and Z-axis in the world coordinate system.

Preferably, the analyzing the position uncertainty in step S30 may include the steps of:

Step S301, obtaining a camera parameter matrix corresponding to the camera parameters Can be expressed as:

Wherein a _ij is the value of any one of the camera parameters.

Step S302, obtaining a covariance coefficient γ according to the first row element of the camera parameter matrix and the image position of the detection target, which can be expressed as:

γ＝ua₁₁+va₁₂+a₁₃，

Wherein, (u, v) is the image position of the detection target;

step S303, obtaining a covariance matrix C according to the camera parameter matrix and the covariance coefficient, which can be expressed as:

Step S304, when the image position error of the detection target is compliant with the normal distribution based on the random variable, determining that the geographic position error of the detection target is compliant with the normal distribution based on the covariance; the normal distribution based on the random variable is N (0, sigma _δ), and delta is the random variable; the normal distribution based on covariance is N (0, c Σ _δC^T).

That is, ifObeying normal distribution, then/>Obeys normal distribution.

It can be understood that in this embodiment, the covariance matrix is calculated by the camera parameter matrix and the covariance coefficient, and the normal distribution corresponding to the geographic position is determined according to the normal distribution corresponding to the image position and the covariance matrix, thereby completing the analysis of the uncertainty of the position.

And step S40, generating a target track of the detection target through a joint probability data association tracking algorithm according to the geographic position and the position uncertainty.

In step S40, the joint probability data association tracking algorithm mainly includes the following steps: step 1, establishing a corresponding relation between a detection target and a target track, and generating an effective confirmation matrix; step 2, generating a feasible association event according to the effective confirmation matrix; and step 3, calculating the association probability of the detection target and the target track according to the feasible association event.

Preferably, the step S40 may include the steps of:

In step S401, measurement data at the current moment is obtained, where the measurement data includes a first number of target observation sets and a second number of track observation sets.

In step S401, M ₁ target observation sets Y _k and M ₂ track observation sets measured at the current time k are acquired. The target observation set Y _k contains an observation position Z _j of the detection target j; the track observation set contains state evaluation values of the target track tau at the last time k-1Covariance/>State transition matrix/>Process noise covariance matrix/>And a noise covariance matrix/>, at the current instant kIt will be appreciated that each target observation set Y _k corresponds to a detected target j and each track observation set corresponds to an entry mark τ.

Wherein the noise covariance matrixObeying a normal distribution with a mean of 0 and a variance of C Σ _δC^T, which can be expressed as

Step S402, calculating prior estimation data of each target track according to the track observation set at the current moment.

In step S402, for each target track τ, the state evaluation value of the target track τ at the last time k-1 is first evaluatedAnd state transition matrix/>Calculating the predicted state/>, of the target track tau at the current moment kI.e.

At the same time, according to the state transition matrix of the target track tau at the last moment k-1Covariance/>And Process noise covariance matrix/>Calculating covariance/>, of target track tau at current moment kI.e.And the predicted state/>, at the current moment k, of the target track tau according to the camera parameters and the target track tauCalculating predicted position value/>I.e./>And then according to the predicted state/> of the target track tau at the current moment kCovariance of prediction State/>Predicted location value/>A priori estimate data at the current time instant k is constructed.

Step S403, obtaining priori estimation data of the target track to obtain a innovation vector.

In step S403, the predicted position value of each target track τ is first obtained from the a priori estimated data, and based on the observed position value of each detected target j and the predicted position value of each target trackCalculate the corresponding innovation vector/>I.e./>So that M ₁×M₂ innovation vectors can be obtained. It can be appreciated that the innovation vector can characterize the correspondence of the detection target to the target track.

And step S404, an elliptical tracking threshold of the target track is obtained according to the innovation vector.

In step S404, the predicted position value of the target track τ is calculatedAnd the observed position value Z _j of the detection target j, at this time/>And when the detection target j meets the ellipticity condition, determining that the threshold requirement is met. Wherein, the ellipticity condition is:

where k is a constant; σ _z1、σ_z2…σ_zn is uncertainty of the observed position value of the detection target j in different dimensions.

Step S405, according to the target observation set, an effective confirmation matrix at the current moment is established.

Specifically, according to the effective observation set Y _k, an effective confirmation matrix Ω at the current time k is established, which may be expressed as:

Wherein the effective confirmation matrix Ω is a matrix of M ₁×(M₂ +1); omega _jτ is an elliptic tracking threshold for detecting whether the target j is in the target track tau, if so, omega _jτ is 1, otherwise, omega _jt is 0; the first column indicates whether the detected object belongs to the object trajectory 0 from 1 to M ₁, i.e., whether it belongs to clutter if the first column is all 1, then it indicates that all detected objects j are likely to be from clutter.

Step S406, corresponding feasible association events are generated according to the effective determination matrix of the current moment.

Specifically, the effective determination matrix Ω at the current time k is split to obtain L feasible association matrices Ω _i, and a feasible association event θ _i is generated according to each feasible association matrix Ω _i.

That is, each feasible association event θ _i corresponds to a feasible association matrix Ω _i, where the feasible association matrix is defined as a matrix in which the elements in the matrix need to satisfy the following two constraints:

constraint 1: each detected target can only originate from one source, i.e. from one of the target trajectories or clutter. The sum of elements of each row of the corresponding feasible association matrix is equal to 1, namely

Constraint 2: at most, each target trajectory can only produce one real observation. The sum of elements in each column except the first column of the corresponding feasible association matrix is less than or equal to 1, namely

Step S407, obtaining the total probability of the feasible association event, and correcting the innovation vector according to the total probability of the feasible association event.

In step S407, first, the sub-probabilities for each feasible correlated event are calculatedCan be expressed as:

Wherein, α is a normalization factor, assuming a poisson distribution with a clutter compliance parameter of CV (C represents the number of clutter expected per unit volume, V represents the volume of the correlation threshold); m is the dimension of the innovation vector; s _τj is the covariance of the target track tau and the detection target j; The probability of being detected for the target trajectory τ;

The total probability of a viable correlation event, β _jτ, is then calculated from the sub-probabilities of each viable correlation event, which can be expressed as:

Wherein ω _jτ(θ_i) is a weight coefficient of the feasible correlated event θ _i.

Finally, the innovation vector is modified according to the total probability β _jτ, and the modified innovation vector V ^τ can be expressed as:

Wherein V ^τ is the innovation vector after the target track τ is corrected.

Step S408, calculating a Kalman gain according to the corrected innovation vector, and acquiring posterior estimation data of the target track according to the Kalman gain.

In step S408, the calculation formula of the kalman gain is:

Wherein, Is Kalman gain; h ^T is a transposed matrix of the camera parameter matrix; /(I)Covariance of the target track tau at the current moment k;

Wherein, Is the noise covariance matrix of the target trajectory τ at the current time k.

The posterior estimation data of the target track comprises a posterior probability filtering value and posterior state covariance, and the calculation formula of the posterior probability filtering value is as follows:

The calculation formula of covariance of posterior state is:

step S409, continuing to iterate to the next moment according to the posterior estimation data of the target track.

It can be understood that the target track can be obtained through steps S401 to S409 in this embodiment.

Further, in the step S40, the tracking cluster of the detected target may be generated while the target track of the detected target is generated by the joint probability data association tracking algorithm, and at this time, after the step S404, the method further includes the following steps:

Step S4010, clustering the target tracks to generate tracking clusters.

In step S4010, the tracking cluster is composed of a set of target tracks and detected targets for which the target tracks are within an elliptical tracking threshold. Tracking clusters are defined as: all selectable detection targets of the target tracks of the no-longer-clustered clusters do not necessarily belong to the cluster, namely, the target tracks of the detection targets in any selected cluster belong to the cluster, and the detection targets selected by the target tracks need to be added into the cluster.

And S50, acquiring an association cost matrix between the currently captured detection target and the tracking target according to the target detection confidence, the geographic position and the track matching information.

In step S50, the detection target is a target captured at the current time, and the tracking target is a target tracked at the previous time. The target detection confidence is the confidence of the detection target and can be obtained through a target detection model. The track matching information is the track matching similarity of the detection target and the tracking target, the value of the track matching similarity is 0 or 1, and when the track matching of the detection target and the tracking target is obtained by adopting the joint probability data association tracking algorithm, the value of the track matching similarity is 1, otherwise, the value of the track matching similarity is 0.

In this embodiment, the target detection model identifies the video image, and may simultaneously acquire the detection target in the video image and the confidence coefficient of the detection target, that is, acquire the target detection confidence coefficient. Secondly, based on the geographic position (x, y, z) of the detection target, the distance cost of the detection target and the tracking target can be obtained through a joint probability data association tracking algorithm. And finally, calculating an association cost matrix between the detection target and the tracking target according to the confidence coefficient of the detection target, the distance cost of the detection target and the tracking target and the track matching similarity.

Preferably, the step S50 of acquiring the association cost matrix between the currently captured detection target and the tracking target specifically includes the following steps:

step S501, determining the number n of matrix rows according to the number of currently captured detection targets, determining the number m of matrix columns according to the number of tracking targets, and generating an n multiplied by m matrix frame;

Step S502, constructing a correlation cost function according to the credibility of the detection target, the distance cost of the detection target and the tracking target and the matching similarity of the track; wherein, the association cost function is:

L_pq＝α·R_p+β·D_pq+θ·T_pq，

In the above formula, L _pq is an associated cost function, alpha, beta and theta are weight coefficients, R _p is the credibility of the detection target p, D _pq is the distance cost of the detection target p and the tracking target q, and T _pq is the matching similarity of the trajectories of the detection target p and the tracking target q

Step S503, obtaining an association cost value of each detection target and each tracking target through the association cost function, and filling the association cost value into the n×m matrix framework to obtain an association cost matrix.

It can be understood that the method and the device can acquire the associated cost value of each detection target and each tracking target through the associated cost function, and can acquire the associated cost matrix by combining multi-dimensional information, thereby being beneficial to improving multi-target tracking robustness.

Further, when the target track and the tracking cluster of the detected target are generated by the joint probability data association tracking algorithm, the step S50 is: and acquiring an association cost matrix between the currently captured detection target and the tracking target according to the target detection confidence, the geographic position, the track matching information and the cluster matching information.

The cluster matching information is the cluster matching similarity between the detection target and the tracking target, the value of the cluster matching similarity is 0 or 1, when the detection target and the tracking target are in the same cluster, the value of the cluster matching similarity is 1, and otherwise, the value of the cluster matching similarity is 0.

Accordingly, the step S502 is: constructing a correlation cost function according to the reliability of the detection target, the distance cost of the detection target and the tracking target, the track matching similarity and the cluster matching similarity; wherein, the association cost function is:

L_pq＝α·R_p+β·D_pq+θ·T_pq+σ·C_pq，

In the above formula, σ is a weight coefficient, and C _pq is a cluster matching similarity between the detection target p and the tracking target q.

And step S60, according to the association cost matrix, associating the effective detection frame with the existing track through a Hungary algorithm so as to execute a visual tracking task.

Preferably, in the step S60, the effective detection frame is associated with the existing track through the hungarian algorithm, which may include the following steps:

step S601, performing minimum value elimination and minimum element subtraction processing on the associated cost matrix to obtain a zero-containing matrix;

Step S602, covering zero elements in the zero matrix with the least number of horizontal lines and vertical lines, and recording the number of horizontal lines used;

step S603, detecting whether the number of horizontal lines is less than the number of matrix rows;

Step S604, if yes, acquiring a minimum element which is not covered by a horizontal line from the zero-containing matrix, subtracting the minimum element from the element which is not found, and adding the minimum element to all elements covered twice to acquire the association relation between an effective detection frame and the existing track; if not, determining that the optimal allocation exists between zero elements, and stopping the Hungary algorithm.

In this embodiment, after obtaining an n×m association cost matrix, performing row minimum value elimination on the association cost matrix, and subtracting each element of each row from the lowest element of the row; and then, eliminating the minimum value of the columns of the association cost matrix, and subtracting each element in each column from the lowest element in the column, so as to obtain the association cost matrix containing a plurality of zero elements, namely, a zero-containing matrix.

After the zero-containing matrix is obtained, covering all zero elements in the matrix by using the minimum number of rows, and if n rows are required to be covered, judging that the optimal allocation exists among the zero elements, and stopping the algorithm; if the number of the rows is less than n, generating additional zero elements, and adding the minimum element k used for generating the zero elements into all matrix elements covered twice, so that the matching relation between the effective detection frame and the existing track can be obtained.

It can be appreciated that the best matching of the effective detection frame with the existing track can be realized through the hungarian algorithm.

In summary, according to the multi-target visual tracking method based on camera parameters provided in the embodiment, firstly, camera parameter evaluation is performed through a video image selected from video stream data, so that scale effects can be eliminated; then mapping the image position of the detected target to a geographic position through the estimated camera parameters, analyzing the position uncertainty, and fully utilizing the position information of the target to target tracking; then, a target track of a detection target is generated through a joint probability data association tracking algorithm, and the detected clutter information can be fully utilized to target tracking; finally, according to the target detection confidence, the geographic position and the track matching information, an association cost matrix between the detection target and the tracking target is obtained, and an effective detection frame and the existing track are associated through a Hungary algorithm, so that the robustness of visual multi-target tracking and the multi-target tracking precision are improved, and meanwhile, the problem of identity switching during shielding overlapping in multi-target visual tracking is solved.

Based on the same inventive concept, the embodiment of the invention also provides a multi-target visual tracking device based on camera parameters, which corresponds to the multi-target visual tracking method based on camera parameters in the embodiment one by one. As shown in fig. 3, the multi-target vision tracking device based on camera parameters includes the following modules, and each functional module is described in detail as follows:

A detection target detection module 110, configured to acquire video stream data, and acquire an image position of a detection target through a target detection model;

the camera parameter evaluation module 120 is configured to select any frame of video image from the video stream data, and perform camera parameter evaluation;

A position uncertainty obtaining module 130, configured to map an image position of the detection target to a geographic position according to the estimated camera parameter, and analyze a position uncertainty;

The track generation module 140 is configured to generate a target track of the detection target through a joint probability data association tracking algorithm according to the geographic location and the location uncertainty;

The association cost matrix acquisition module 150 is configured to acquire an association cost matrix between a currently captured detection target and a tracking target according to the target detection confidence level, the geographic position and the track matching information;

the visual tracking module 160 is configured to associate the valid detection frame with the existing track by using a hungarian algorithm according to the association cost matrix, so as to perform a visual tracking task.

In an alternative embodiment, the camera parameter evaluation module 120 includes the following sub-modules, and each functional sub-module is described in detail below:

A camera initialization sub-module for initializing the camera to set the camera parameters as default values;

a grid construction sub-module for constructing n×n image grids;

The grid mapping sub-module is used for mapping the image grid into the selected video image according to the initialized camera;

The camera parameter manual adjustment sub-module is used for adjusting the default value of the camera parameter to enable the image grid to be tiled on the ground in the video image, so as to obtain a preliminary camera parameter;

The target height acquisition sub-module is used for acquiring the image position of the detection target in the video image through the target detector and acquiring the detection target height by combining the preliminary camera parameters;

and the camera parameter automatic optimization sub-module is used for constructing a camera parameter optimization function according to the detection target height and obtaining optimized camera parameters through the camera parameter optimization function.

In an alternative embodiment, the location uncertainty obtaining module 130 includes sub-modules, and each functional sub-module is described in detail below:

The camera parameter matrix conversion sub-module is used for obtaining a camera parameter matrix corresponding to the camera parameters;

The covariance matrix acquisition submodule is used for acquiring a covariance coefficient according to the first row element of the camera parameter matrix and the image position of the detection target; acquiring a covariance matrix according to the camera parameter matrix and the covariance coefficient;

An uncertainty processing sub-module, configured to determine that a geographic position error of the detection target is compliant with a normal distribution based on covariance when the image position error of the detection target is compliant with the normal distribution based on random variables; the normal distribution based on the random variable is N (0, sigma _δ), and delta is the random variable; the normal distribution based on covariance is N (0, c Σ _δC^T).

In an alternative embodiment, the association cost matrix obtaining module 150 includes sub-modules, and each functional sub-module is described in detail below:

The matrix frame module is used for determining the number n of matrix rows according to the number of currently captured detection targets, determining the number m of matrix columns according to the number of tracking targets and generating an n multiplied by m matrix frame;

The correlation cost function generation sub-module is used for determining a correlation cost function according to the credibility of the detection target, the distance cost of the detection target and the tracking target and the track matching similarity;

and the association cost matrix processing sub-module is used for acquiring the association cost value of each detection target and each tracking target through the association cost function, and filling the association cost value into the n multiplied by m matrix framework to obtain an association cost matrix.

In an alternative embodiment, the visual tracking module 160 includes sub-modules, each of which is described in detail below:

The association cost matrix processing submodule is used for carrying out minimum value elimination and minimum element subtraction on the association cost matrix to obtain a zero-containing matrix;

a row coverage processing sub-module, configured to cover zero elements in the zero-containing matrix with a minimum number of horizontal lines and vertical lines, and record the number of horizontal lines used;

The matching relation acquisition sub-module is used for detecting whether the number of the horizontal lines is smaller than the number of the rows of the matrix; if yes, acquiring a minimum element which is not covered by a horizontal line from the zero-containing matrix, subtracting the minimum element from the element which is not found, and adding the minimum element into all elements covered twice to acquire the association relation between an effective detection frame and the existing track; if not, determining that the optimal allocation exists between zero elements, and stopping the Hungary algorithm.

For specific limitations on the camera parameter-based multi-target visual tracking apparatus, reference may be made to the above limitations on the camera parameter-based multi-target visual tracking method, and no further description is given here. The various modules in the above-described camera parameter-based multi-target visual tracking device may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

Based on the same inventive concept, an embodiment of the present invention also proposes a computer device comprising a memory, a processor and a multi-target vision tracking program stored in the memory and executable on the processor, the processor implementing the steps of the camera parameter based multi-target vision tracking method as in the above embodiment when executing the multi-target vision tracking program.

Based on the same inventive concept, the embodiments of the present invention also provide a computer readable storage medium having a multi-target vision tracking program stored thereon, which when executed by a processor, implements the steps of the multi-target vision tracking method based on camera parameters as in the above embodiments.

The method implemented when the multi-target visual tracking program is executed by the processor may refer to embodiments of a multi-target visual method based on camera parameters, which are not described herein.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as described above, comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, a television, a set-top box or a network device, etc.) to perform the method according to the embodiments of the present invention.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. A multi-target visual tracking method based on camera parameters, comprising:

selecting any frame of video image from the video stream data, and carrying out camera parameter evaluation:

initializing a camera to set camera parameters to default values;

Construction A grid of images;

mapping the image grid into the selected video image according to the initialized camera;

Adjusting default values of the camera parameters to enable the image grids to be tiled on the ground in the video image, and obtaining preliminary camera parameters;

Acquiring an image position of a detection target in the video image through a target detector, and acquiring a detection target height by combining the preliminary camera parameters;

Constructing a camera parameter optimization function according to the detection target height, and obtaining optimized camera parameters through the camera parameter optimization function;

The camera parameter optimization function is as follows:

，

Wherein, A camera parameter matrix corresponding to the optimized camera parameters; /(I)Obtaining a variable value when the camera parameter optimization function obtains a minimum value; /(I)For detecting the number of targets; /(I)Image frame number for video data stream; /(I)Is a standard deviation function; /(I)Super parameters optimized for the function; /(I)Is a mean function; /(I)For/>Individual detection targets are at/>Height vectors in the frame video image; /(I)For/>/>, In frame video imagesHeight vectors of the individual detection targets; /(I)The average value of the height of the target is detected;

The obtaining the association cost matrix between the currently captured detection target and the tracking target comprises the following steps:

determining the number of matrix rows according to the number of currently captured detection targets And determining the number of matrix columns according to the number of the tracked targetsAnd generate/>A matrix frame;

determining an association cost function according to the credibility of the detection target, the distance cost of the detection target and the tracking target and the matching similarity of the track;

the association cost function is:

，

Wherein, To correlate cost function,/>、/>And/>Is a weight coefficient,/>To detect target/>Reliability of/>To detect target/>And tracking target/>Distance cost of/>To detect target/>And tracking target/>Trajectory matching similarity of (2);

Acquiring the associated cost value of each detection target and each tracking target through the associated cost function, and filling the associated cost value into the tracking target A matrix framework for obtaining an associated cost matrix;

2. The camera parameter-based multi-target visual tracking method of claim 1, wherein the analyzing the position uncertainty comprises:

acquiring a camera parameter matrix corresponding to the camera parameters;

acquiring a covariance coefficient according to the first row element of the camera parameter matrix and the image position of the detection target;

acquiring a covariance matrix according to the camera parameter matrix and the covariance coefficient ；

Determining that the geographic position error of the detection target is compliant with normal distribution based on covariance when the image position error of the detection target is compliant with normal distribution based on random variables; the normal distribution based on the random variable is that，Is a random variable; the covariance-based normal distribution is/>Wherein/>For covariance matrix/>Is a transpose of (a).

3. The multi-target visual tracking method based on camera parameters of claim 1, wherein said correlating the effective detection box with the existing track by hungarian algorithm comprises:

performing minimum value elimination and minimum element subtraction treatment on the association cost matrix to obtain a zero-containing matrix;

Covering zero elements in the zero-containing matrix by using the minimum number of horizontal lines and vertical lines, and recording the number of the horizontal lines;

detecting whether the number of the horizontal lines is smaller than the number of the rows of the matrix;

If yes, acquiring a minimum element which is not covered by a horizontal line from the zero-containing matrix, subtracting the minimum element from the element which is not found, and adding the minimum element into all elements covered twice to acquire the association relation between an effective detection frame and the existing track; if not, determining that the optimal allocation exists between zero elements, and stopping the Hungary algorithm.

4. A multi-target visual tracking device based on camera parameters, comprising:

The camera parameter evaluation module includes:

Grid construction submodule for constructing A grid of images;

The camera parameter automatic optimization sub-module is used for constructing a camera parameter optimization function according to the detection target height and obtaining optimized camera parameters through the camera parameter optimization function;

The camera parameter optimization function is as follows:

，

the association cost function is:

，

5. A computer device comprising a memory, a processor and a multi-target vision tracking program stored in the memory and executable on the processor, wherein the processor implements the camera parameter-based multi-target vision tracking method of any one of claims 1 to 3 when the processor executes the multi-target vision tracking program.

6. A computer readable storage medium storing a multi-target vision tracking program, wherein the multi-target vision tracking program when executed by a processor implements the camera parameter-based multi-target vision tracking method of any one of claims 1 to 3.