CN111862156A

CN111862156A - Multi-target tracking method and system based on graph matching

Info

Publication number: CN111862156A
Application number: CN202010689629.7A
Authority: CN
Inventors: 项俊; 王超; 侯建华; 麻建; 徐国寒
Original assignee: South Central University for Nationalities
Current assignee: South Central Minzu University
Priority date: 2020-07-17
Filing date: 2020-07-17
Publication date: 2020-10-30
Anticipated expiration: 2040-07-17
Also published as: CN111862156B

Abstract

The invention discloses an on-line multi-target tracking algorithm based on graph matching, which converts a data association problem of detection response between two continuous frames into a graph matching problem. Firstly, two deep convolution neural networks are designed to respectively solve the intimacy between two graph vertexes and the intimacy between two graph edges, then the intimacy between the two graphs is directly filled with the intimacy between the vertexes and the intimacy between the edges, and finally the intimacy matrix is processed to obtain a final matching matrix (namely, a correlation matrix between detection). Therefore, the method can effectively reflect the relevance of real data in the multi-target tracking process, and the tracking result is high in accuracy.

Description

Multi-target tracking method and system based on graph matching

Technical Field

The invention belongs to the technical field of pattern recognition, and particularly relates to a multi-target tracking method and system based on graph matching.

Background

multi-Object Tracking (MOT) plays an important role in the field of computer vision, and its main task is to analyze video to identify and track objects belonging to one or more categories without any prior appearance and number of objects, and has an important role in the fields of motion analysis, human-computer interaction, video surveillance (e.g., abnormal behavior identification), and automatic driving.

The first method is to adopt a detection-before-tracking strategy, namely, a detector is firstly used to locate the position of an interested target in each frame based on a tracking mode of detection, and then an identity is allocated to each detection through data association, wherein the core problem of the strategy is data association; in multi-target tracking, data association can be regarded as a bipartite graph distribution problem, namely, a corresponding relation between an existing track and a newly generated detection is determined, and then a Hungarian algorithm is adopted for solving; the second one is to adopt a multi-target tracking strategy based on a graph model, which utilizes a conditional random field to model and detect the space-time relationship between the two or between the two track pieces; the third is to employ a machine learning based multi-target tracking strategy that uses a kalman filter to model the detected motion model for estimating the position of the detection in the next frame.

However, the above existing multi-target tracking method has some non-negligible defects in data association:

firstly, for a multi-target tracking method based on detection and tracking, as the traditional Hungarian algorithm only uses the associated cost between vertexes in the solving process and does not use the topological information of the vertexes, which is very important in graph matching, the accuracy of the tracking result is low;

Second, in graph models such as conditional random fields, the structure of the graph is fixed, which results in nodes and edges being fixed, so that nodes and edges with inaccurate features cannot be corrected, resulting in low tracking accuracy;

third, for the target tracking strategy using the conventional machine learning method, the model complexity is high, and manual design parameters are required.

Disclosure of Invention

The invention provides a multi-target tracking method and a multi-target tracking system based on graph matching, aiming at solving the problem of data association between detection between continuous frames by combining deep learning and a traditional graph matching frame so as to complete on-line multi-target tracking and further solving the technical problems that the topological relation between vertexes in the traditional multi-target tracking method is not fully utilized, the target tracking accuracy rate is low due to fixed structure in the multi-target tracking method based on a graph model, and the multi-target tracking method adopting a machine learning model has high complexity and needs manual parameter design.

To achieve the above object, according to one aspect of the present invention, there is provided a multi-target tracking method based on graph matching, including the steps of:

(1) Acquiring a multi-target tracking data set which comprises an input video sequence and a detection response of each frame in the input video sequence;

(2) setting counter cnt1 to 1;

(3) judging whether cnt1 is equal to the total frame number of the input video sequence, if yes, entering step (14), otherwise, entering step (4);

(4) acquiring a cnt1 th frame and a cnt1+1 th frame from the input video sequence obtained in the step (1), constructing a graph G1 according to a previous frame, constructing a graph G2 according to a next frame, wherein the graphs G1 and G2 respectively comprise two vertex sets V1 and V2;

(5) respectively acquiring all vertexes in vertex sets V1 and V2 in graphs G1 and G2 constructed in the step (4), and respectively inputting the vertexes into a trained triplet network to obtain a feature vector corresponding to each vertex;

(6) inputting the feature vectors of the vertexes obtained in the step (5) into the trained first shallow neural network and second shallow neural network respectively to obtain the matching degree between each vertex in the vertex set V1 of the graph G1 and each vertex in the vertex set V2 of the graph G2 and the matching degree between each connecting edge in the edge set E1 of the graph G1 and each connecting edge in the edge set E2 of the graph G2 respectively;

(7) constructing an affinity matrix M according to the matching degree between each vertex in the vertex set V1 of the graph G1 and each vertex in the vertex set V2 of the graph G2 obtained in the step (6) and the matching degree between each connecting edge in the edge set E1 of the graph G1 and each connecting edge in the edge set E2 of the graph G2;

(8) Performing power iteration processing on the intimacy matrix M obtained in the step (7) to obtain an optimal distribution vector v^*；

(9) For the optimal distribution vector v obtained in the step (8)^*Performing bidirectional randomization to obtain a distribution matrix S;

(10) setting counter cnt2 to 1;

(11) judging whether the counter cnt2 is equal to the total number of the vertexes in the graph G1, if so, entering the step (14), otherwise, entering the step (12);

(12) obtaining a vertex preliminarily matched with a cnt2 th vertex in the graph G1 in the graph G2 by the distribution matrix S obtained in the step (9), judging whether the intersection ratio between two detection responses corresponding to the two vertices preliminarily matched with each other in the graph G1 and the graph G2 is greater than or equal to a preset threshold value, if so, establishing the association between the target corresponding to the vertex in the graph G1 in the previous frame and the target corresponding to the vertex in the graph G2 in the next frame, and then entering the step (13), otherwise, directly entering the step (13);

(13) setting a counter cnt 1-cnt 1+1, and returning to the step (3);

(14) for each target in the first frame of the input video sequence, all targets which are associated with the target in all the remaining frames of the whole input video sequence are combined with the target to form a tracking track of the target.

Preferably, the vertices in the vertex set V1 are all the detection responses in the previous frame, and the vertices in the vertex set V2 are all the detection responses in the next frame;

if the distance between the corresponding targets in the previous frame of two vertexes in the vertex set V1 is smaller than or equal to a threshold, the two vertexes have a connecting edge therebetween, and all the connecting edges corresponding to the vertex set V1 form an edge set E1;

if the distance between the corresponding targets in the next frame of two vertices in the vertex set V2 is less than or equal to a threshold, the two vertices have a connecting edge therebetween, and all the connecting edges corresponding to the vertex set V2 form an edge set E2.

Preferably, the triple network is composed of three paths, each path comprising a sequentially connected ResNet-50 convolutional neural network model and two fully-connected layers, wherein:

the first layer is a ResNet-50 convolutional neural network model, which comprises 1 convolutional layer, 16 building block structures and 1 full-connection layer.

The second layer is a full connection layer, the ReLU is adopted as the activation function, and the output is 1024-dimensional characteristic vectors;

the third layer is a fully connected layer, the activation function adopts ReLU, and the output is a 128-dimensional feature vector.

Preferably, the first and second shallow neural networks have the same structure, and the network structures are both:

The first layer is a full-connection layer, the activation function adopts ReLU, the retention probability of Dropout is set to be 0.5, and the output is a 1024-dimensional feature vector;

the second layer is a full connection layer, the activation function adopts Softmax, and the output is a feature vector with 2 dimensions.

Preferably, for the vertex v_iE V1 and vertex V_aE.g. V2, its matching degree S_iaComprises the following steps:

S_ia＝F_N([F_i,F_a])

wherein each corresponds to a vertex v_iAnd v_aFeature vector F of_i,F_a∈R^1×dD is the size of the feature vector; [. the]Representing a stitching operation that concatenates a plurality of vectors; f_NRepresenting a first shallow neural network with a softmax layer, i and j ∈[1，n]，a∈[1，m]N and m represent the total number of vertices in vertex sets V1 and V2, respectively;

connecting edge F starting from ith vertex and ending from jth vertex_ij＝[F_i,F_j]In which F is_i，F_jRespectively represent a connecting edge F_ijThe feature vectors of the two connected vertexes i and j;

connecting edge (v)_i,v_j) E E1 and connecting edge (v)_a,v_b) E matching degree S between E2_ij：abComprises the following steps:

S_ij:ab＝F_E([F_ij,F_ab])

wherein b is ∈ [1, m ]]Feature vector F_ij,F_ab∈R^1×2d，F_ERepresenting a second shallow neural network with a softmax layer.

Preferably, the elements on the diagonal in the intimacy matrix M correspond to the degree of match between the vertices, i.e. i ═ j and a ═ b, then M_ia:jbIs equal to vertex v_iAnd vertex v_jThe degree of matching;

the elements on the non-diagonal in the affinity matrix M represent the degree of matching between the connected edges, i.e. for i ≠ j and a ≠ b, then M _ia:jbEqual to the connecting edge (v)_i,v_j) And a connecting edge (v)_a,v_b) The matching degree between the two; for i ≠ j and a ≠ b or i ≠ j and a ═ b, then M_ia:jb＝0。

Preferably, the step (8) is to calculate the optimal distribution vector v of the affinity matrix M by using the following formula^*：

Wherein v is₀As unit vectors, i.e. v₀1, | | · | | denotes l₂Norm, k represents iteration frequency, the value range of k is 2 to 4, and the final result of iterative computation is used as the optimal distribution vector v^*。

Preferably, step (9) is iterated using the following formula, resulting in the final assignmentMatrix array

As the allocation matrix S:

wherein S is_nmIs represented by v^*The matrix with the size of nxm is obtained through transformation, the superscript t +1 represents the t +1 th iteration process, t represents the iteration times, and the value range of t is 3-7;

in the distribution matrix S, the ith row and the a column of elements represent the vertex v of the graph G1_iAnd the vertex v in the graph G2_aA match value between.

Preferably, in step (12), all elements in the ith row are obtained from the distribution matrix S, and the element with the largest value is selected from the i-th row, and the corresponding column number a is the vertex v in the graph G1_iPreliminarily matched vertex v in graph G2_a。

According to another aspect of the present invention, there is provided a multi-target tracking system based on graph matching, including:

A first module for obtaining a multi-target tracking data set comprising an input video sequence and a detection response for each frame in the input video sequence;

a second module for setting the counter cnt1 to 1;

a third module, configured to determine whether cnt1 equals to a total frame number of the input video sequence, if so, enter a fourteenth module, otherwise, enter a fourth module;

a fourth module, configured to obtain a cnt1 th frame and a cnt1+1 th frame from the input video sequence obtained by the first module, construct a graph G1 according to a previous frame, construct a graph G2 according to a next frame, where the graphs G1 and G2 respectively include two vertex sets V1 and V2;

a fifth module, configured to obtain all vertices in vertex sets V1 and V2 in graphs G1 and G2 constructed by the fourth module, respectively, and input the vertices into a trained triplet network, so as to obtain feature vectors corresponding to the vertices;

a sixth module, configured to input the feature vectors of the vertices obtained by the fifth module into the trained first and second shallow neural networks, respectively, to obtain a matching degree between each vertex in the vertex set V1 of fig. G1 and each vertex in the vertex set V2 of fig. G2, and a matching degree between each connecting edge in the edge set E1 of fig. G1 and each connecting edge in the edge set E2 of fig. G2;

A seventh module, configured to construct an affinity matrix M according to the matching degree between each vertex in the vertex set V1 of the graph G1 and each vertex in the vertex set V2 of the graph G2, and the matching degree between each connecting edge in the edge set E1 of the graph G1 and each connecting edge in the edge set E2 of the graph G2 obtained by the sixth module;

an eighth module, configured to perform power iteration on the intimacy matrix M obtained by the seventh module to obtain an optimal allocation vector v^*；

A ninth module for obtaining the optimal distribution vector v of the eighth module^*Performing bidirectional randomization to obtain a distribution matrix S;

a tenth module for setting the counter cnt2 to 1;

an eleventh module for determining whether the counter cnt2 is equal to the total number of vertices in graph G1, if so, entering a fourteenth module, otherwise, entering a twelfth module;

a twelfth module, configured to obtain, in the distribution matrix S obtained by the ninth module, a vertex in the graph G2 that is preliminarily matched with the cnt 2-th vertex in the graph G1, determine whether a cross-over ratio between two detection responses corresponding to the two vertices preliminarily matched with each other in the graph G1 and the graph G2 is greater than or equal to a preset threshold, if so, establish an association between a target corresponding to the vertex in the previous frame in the graph G1 and a target corresponding to the vertex in the next frame in the graph G2, and then enter the thirteenth module, otherwise, directly enter the thirteenth module;

A thirteenth module for setting the counter cnt 1-cnt 1+1 and returning to the third module;

and a fourteenth module, configured to, for each target in the first frame of the input video sequence, form a tracking track of the target by combining all targets, which are associated with the target, in all remaining frames of the entire input video sequence with the target.

In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:

(1) the invention adopts the steps (2) to (8) to convert the data association problem of detection response between two continuous frames into the problem of graph matching, establishes the corresponding relation between two graph vertexes represented by a vertex local structure and a pair relation, and effectively utilizes the topological information between the vertexes in the graph structure, so that the invention can solve the technical problem that the accuracy of the tracking result is low because the topological information of the vertexes is not utilized in the traditional multi-target tracking method based on detection and tracking.

(2) Because the invention adopts the steps (5) to (8) and utilizes the matching degree relation of the vertexes and the edges to construct the intimacy matrix, the invention can solve the technical problem that nodes and edges with inaccurate characteristics can not be corrected and the tracking accuracy is low because the structure of the graph is fixed in the existing graph model such as a conditional random field, which can cause the nodes and the edges to be fixed.

(3) Because the step (5) and the step (6) are adopted, the matching degree of the top point and the edge is calculated by using the neural network, and the accuracy of the matching degree is improved by effectively utilizing the neural network, the method can solve the problems that the tracking change under the complex background can not be processed due to poor model robustness of the machine learning model in the conventional multi-target tracking method adopting the machine learning model.

(4) The invention has wide application range, not only can be used for tracking pedestrians, but also can be applied to tracking the track of a moving target of any known type.

(5) The method solves the problem of data association between detection of two continuous frames in the multi-target tracking process by using the deep learning-based graph matching algorithm for the first time, realizes the data association by using the deep learning method, and is beneficial to improving the accuracy and efficiency of target tracking in a complex scene.

Drawings

FIG. 1 is a flow chart of a multi-target tracking method based on graph matching according to the present invention;

FIG. 2 is the frame extracted in step (4) of the method of the present invention, wherein FIG. 2(a) is the previous frame and FIG. 2(b) is the next frame;

FIG. 3 is the object in the frame extracted in step (4) of the method of the present invention, wherein FIG. 3(a) is the object in the previous frame and FIG. 3(b) is the object in the next frame;

FIG. 4 is a diagram constructed in step (4) of the method of the present invention, wherein FIG. 4(a) is a diagram G1 constructed, and FIG. 4(b) is a diagram G2 constructed;

FIG. 5 is a schematic diagram of the structure of the three-tuple network used in step (5) of the method of the invention;

FIG. 6 is a schematic diagram of the structure of the shallow neural network used in step (6) of the method of the present invention;

FIG. 7 is a schematic representation of the affinity matrix M constructed in step (7) of the method of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

As shown in fig. 1, the present invention provides a multi-target tracking method based on graph matching, which comprises the following steps:

(1) acquiring a multi-target tracking data set, which comprises an input video sequence and a Detection response (Detection response) of each frame in the input video sequence;

in this step, the multi-target tracking data set is an MOT16 data set (mainly labeled as moving pedestrians and vehicles) which contains a total of 14 video sequences, wherein 7 video sequences are training sets with labeling information, the other 7 video sequences are test sets, the 7 test video sequences come from 7 different scenes, the shooting view angles and the camera motion conditions are different, and the test set has a length of 5919 frames and contains 182326 detection responses and 830 tracks.

(2) Setting counter cnt1 to 1;

(4) acquiring a cnt1 th frame (shown in FIG. 2 (a)) and a cnt1+1 th frame (shown in FIG. 2 (b)) from the input video sequence obtained in the step (1), and constructing a graph G1 from a previous frame (shown in FIG. 4 (a)) and a graph G2 from a next frame (shown in FIG. 4 (b));

specifically, the graphs G1 and G2 constructed at this step include two vertex sets V1 and V2, respectively, the vertices in the vertex set V1 are all the detection responses in the previous frame, and the vertices in the vertex set V2 are all the detection responses in the next frame.

If the distance between two vertices (i.e., detection responses) in the vertex set V1 and the corresponding targets (as shown in fig. 3 (a)) in the previous frame is smaller than or equal to a threshold, there is a connecting edge between the two vertices, and all connecting edges corresponding to the vertex set V1 constitute an edge set E1; similarly, if the distance between two vertices (i.e., detection responses) in the vertex set V2 in the next frame and the corresponding targets (as shown in fig. 3 (b)) is smaller than or equal to a threshold, there is a connecting edge between the two vertices, and all connecting edges corresponding to the vertex set V2 constitute the edge set E2. In the present embodiment, the threshold value ranges from 80 to 120 pixel values, preferably 100.

The method has the advantages that the data association problem of detection response between two continuous frames is converted into the graph matching problem, the graph matching is carried out on the two graph vertexes represented by the vertex local structure and the pair relationship, the topological information of the vertexes and the vertexes in the graph structure is effectively utilized, the correlation of real data in the multi-target tracking process can be effectively reflected, and the tracking result is high in accuracy.

(5) Respectively acquiring all vertexes in vertex sets V1 and V2 in graphs G1 and G2 constructed in the step (4), and respectively inputting the vertexes into a trained triplet network (shown in FIG. 5) to obtain a feature vector corresponding to each vertex;

as shown in fig. 5, the architecture of the tri-tuple network of the present invention is as follows:

the triple network is composed of three paths, each path comprises a ResNet-50 convolution neural network model and two fully connected layers which are connected in sequence.

The first layer is a ResNet-50 convolutional neural network model, which comprises 1 convolutional layer, 16 building block (building block) structures and 1 full-connection layer.

The second layer is a fully-connected layer, the ReLU is adopted by the activation function, and the output is a feature vector with 1024 dimensions.

The loss function used in the training of the triplet network of the present invention is a triplet loss function. On the basis, a triple network of sample pairs is constructed by using training sets of MOT16 and 2DMOT15 to be subjected to fine adjustment, and a final model is obtained. Each detection response in the two graphs is selected as the input of the network, so that the feature vector of the detection response can be obtained, and the dimension of the feature vector is 128.

fig. 6 shows the structures of the first and second shallow neural networks used in this step, which are identical, and the network structures are both:

the first layer is a fully-connected layer, the activation function adopts ReLU, the retention probability of Dropout is set to be 0.5, and the output is a feature vector with 1024 dimensions.

The loss function used by the first and second shallow neural networks during training used in this step is the cross entropy (Softmax cross-entropy) between the predicted classes and the true classes between vertices (or between connecting edges).

The back propagation algorithm in this step uses Adam-Optimizer as the Optimizer, since the Optimizer has the advantages of momentum and adaptive learning rate, and the initial learning rate is set to 0.0001.

In this step, the training times are 100000 times, and the sample size is set to 32, where 1/4 is a positive sample and 3/4 is a negative sample. The training samples are from six videos including MOT16-04, MOT16-05, MOT16-09, MOT16-10, MOT16-11 and MOT16-13 in a MOT16 training set.

Specifically, let v ∈ {0,1}^nm×1For identifying vectors, the matching relation between the vertexes is expressed if v_i∈V1，v_ae.V 2, and vertex V in the set of vertices V1 of the graph G1_iAnd vertex V in vertex set V2 of FIG. G2_aMatch (i.e. constitute vertex v)_i、v_aIs from the same target), then v _ia1, otherwise, v_ia＝0。

In the implementation process, for the matching degree between the vertexes, the feature vector corresponding to one vertex in one graph extracted in the step (3) and the feature vector corresponding to one vertex in the other graph are spliced to be used as the input of the shallow neural network, and the matching degree of the two vertexes is directly output by the network. For vertex v _iE V1 and vertex V_aE.g. V2, its matching degree S_iaComprises the following steps:

S_ia＝F_N([F_i,F_a])

wherein each corresponds to a vertex v_iAnd v_aFeature vector F of_i,F_a∈R^1×dD is the size of the feature vector; [. the]Representing a stitching operation that concatenates a plurality of vectors; f_NRepresenting the first shallow neural network with a softmax layer, i ∈ [1 ], total number of vertices in vertex set V1]A ∈ [1, total number of vertices in vertex set V2]。

For the degree of matching between edges: for an edge starting from the ith vertex and ending at the jth vertex, we set the feature vector corresponding to the edge as the connection of the feature vectors of the two vertices i and j to which the edge is connected, i.e., F_ij＝[F_i,F_j]. The matching degree between the vertexes is the same as that of the solution, and the edges (v) are connected_i,v_j) E E1 and connecting edge (v)_a,v_b) E matching degree S between E2_ij：abComprises the following steps:

S_ij:ab＝F_E([F_ij,F_ab])

where j ∈ [1, total number of vertices in vertex set V1]B ∈ [1, total number of vertices in vertex set V2]Feature vector F_ij,F_ab∈R^1×2d，F_ERepresenting a second shallow neural network with a softmax layer.

Specifically, the elements in the affinity matrix M are mainly classified into two types: the elements on the diagonal correspond to the degree of matching between the vertices, i.e., i equals j and a equals b, then M_ia:jbIs equal to vertex v_iAnd vertex v_jThe degree of matching;

while the elements on the non-diagonal represent the degree of matching between the connected edges, i.e. for i ≠ j and a ≠ b, then M_ia:jbEqual to the connecting edge (v)_i,v_j) And a connecting edge (v)_a,v_b) The matching degree between the two; for i ≠ j and a ≠ b or i ≠ j and a ═ b, then M_ia:jb＝0。

For example, as shown in FIG. 7, the vertex v in G1₁And vertex v in G2_aThe degree of match between is 1 (i.e., both are matched), then M is_1a:1a1 is ═ 1; the middle edge of G1 in FIG. 7 (v)₁,v₂) And G2 middle edge (v)_a,v_b) The degree of match between is 1 (i.e., both are matched), then M is_1a:2b＝1。

In the implementation process, in the step, the degree of matching obtained in the step (4) is substituted into the degree of affinity matrix M.

Specifically, the optimal allocation vector v of the affinity matrix M can be calculated by performing power iteration using the following formula^*：

Wherein we will v₀Initialised as unit vectors, i.e. v ₀1, | | · | | denotes l₂Norm, k represents the number of iterations, the range of which is 2 to 4, preferably 2, and the final result of the iterative computation is used as the optimal distribution vector v ^*。

The traditional algorithm (such as Hungarian algorithm) only uses the associated cost between the vertexes in the solving process, does not use the topological information of the vertexes, and the information is very important in graph matching.

in particular, the optimal allocation vector v is because one object may only occur at one location at one time instant^*The following constraints must be satisfied, namely: for arbitrary v_iE.g. V1, having

For arbitrary v_aE.g. V2, having

To satisfy this constraint, the optimal allocation vector v is iteratively obtained for the powers^*And performing bidirectional randomization processing. Initial algorithm falseOnly a square matrix is set, but in a multi-target tracking scene, due to frequent appearance and disappearance of targets and non-ideal performance of a detector, the detection numbers contained in the front frame and the rear frame are probably different, and a bidirectional randomization method based on the square matrix assumption is not applicable any more.

Specifically, this step is solved by using a normalization method, and v is first calculated^*Into a matrix S of size n x m_nm(where n and m represent the total number of vertices in vertex sets V1 and V2, respectively), and then iterate using the following equations, resulting in the final assignment matrix

As the allocation matrix S.

And the distribution matrix is subjected to normalized calculation according to rows and columns respectively, wherein i belongs to [1, n ], and j belongs to [1, n ]. The superscript t +1 in the above formula represents the t +1 th iteration process, and t represents the iteration number, and the value range thereof is 3 to 7, preferably 5.

In the distribution matrix S obtained in this step, the ith row and the a th column of elements indicate the vertex v of the graph G1_iAnd the vertex v in the graph G2_aA match value between.

(10) Setting counter cnt2 to 1;

(12) obtaining a vertex preliminarily matched with a cnt 2-th vertex in a graph G1 in a graph G2 by using the distribution matrix S obtained in step (9), judging whether an Intersection over unit (IOU for short) between two detection responses corresponding to the two vertices preliminarily matched with each other in a graph G1 and a graph G2 is greater than or equal to a preset threshold, if so, establishing association between a target corresponding to the vertex in a previous frame in a graph G1 and a target corresponding to the vertex in a next frame in a graph G2 (namely, indicating that the targets respectively corresponding to the two frames before and after the two vertices are continuous tracks), and then entering step (13), otherwise, directly entering step (13);

Specifically, all elements in the ith row are obtained from the distribution matrix S, and the element with the largest numerical value is selected from the elements, and the corresponding column number a is the vertex v in the graph G1_iPreliminarily matched vertex v in graph G2_a。

In the present embodiment, the value range of the preset threshold is 0.4 to 1, and preferably 0.5.

Distribution matrix

A one-to-one mapping constraint is satisfied (i.e., a vertex in G1 has and matches at most one vertex in G2). When associated, v is associated with any vertex in the graph G1_iIn the matching matrix, the vertex v_iThere are m numbers in the row, where the m numbers represent the m vertices and v of graph G2_iFor each vertex v_iThe matching is considered to be matched with the vertex of the maximum value of the row; each row in the matching matrix must have a maximum value, i.e. any detection in the previous frame must have a detection in the next frame that matches it, which obviously does not correspond to the actual situation of multi-target tracking. Due to the frequent appearance and disappearance of the target and the large number of missed detections, the number of detections contained in two consecutive frames is often different, i.e. the track of the previous frame does not necessarily have a detection matching the track of the next frame, so that the present invention must then use another method to determine whether the track is finished.

(13) Setting a counter cnt 1-cnt 1+1, and returning to the step (3);

For example, if the 3 rd object in the first frame is associated with the 4 th object in the second frame, the 5 th object in the third frame is associated with the 4 th object, the 6 th object in the fourth frame is associated with the 5 th object, …, and the 1 st object in the last frame is associated with the 3 rd object in the second last frame, the 3 rd object in the first frame and the 4 th object in the second frame will be combined. The 5 th target in the third frame, the 6 th target in the fourth frame, …, the 3 rd target in the second last frame, and the 1 st target in the last frame form the tracking trajectory of the 3 rd target in the first frame.

In summary, the invention provides an online multi-target tracking algorithm based on graph matching, which converts the data association problem of detection response between two continuous frames into a graph matching problem. Firstly, two deep convolution neural networks are designed to respectively solve the intimacy between two graph vertexes and the intimacy between two graph edges, then the intimacy between the two graphs is directly filled with the intimacy between the vertexes and the intimacy between the edges, and finally the intimacy matrix is processed to obtain a final matching matrix (namely, a correlation matrix between detection). Therefore, the method can effectively reflect the relevance of real data in the multi-target tracking process, and the tracking result is high in accuracy.

Results of the experiment

The practical effect of the invention is illustrated here by the test results on the MOT16 test set. The tracking result of the multi-target tracking algorithm on the MOT16 data set is evaluated by using the following standard evaluation indexes: multi-Object Tracking Accuracy (MOTA), multi-Object Tracking Precision (MOTP), Mostly Lost objects (MT), ML, False Positives (FP), False Negatives (FN), Fragmentation (FM), and identity switching (ID Switches). "↓ ' indicates the higher the better, and' ↓ ' indicates the lower the better. As shown in table 1 below, a detailed comparison of the test results of the present invention and the existing Quad-CNN algorithm, OTCD _1_16 algorithm, and CDA _ DDALv2 algorithm on the MOT16 test set is shown.

As can be seen from table 1 above: (1) the method of the invention is arranged in parallel with OTCD _1_16 on MOTA in the first place, and the three indexes of ML and FN exceed other three algorithms, and particularly, the method needs 12.2 percent less than ML. The MOTA is a main index for evaluating the overall performance of the algorithm, and compared with other three tracking algorithms, the tracking algorithm provided by the invention achieves the best performance, which shows that the overall performance of the tracking algorithm provided by the invention is superior to that of the other three algorithms;

(2) The higher MT and the lower ML show that the method provided by the invention can correctly recover the track of the occluded target or the drifted target by considering the proper time interval and utilizing the dependency relationship between the targets.

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A multi-target tracking method based on graph matching is characterized by comprising the following steps:

(2) setting counter cnt1 to 1;

(10) setting counter cnt2 to 1;

(13) Setting a counter cnt 1-cnt 1+1, and returning to the step (3);

2. The multi-target tracking method according to claim 1,

the vertices in the set of vertices V1 are all the detection responses in the previous frame, and the vertices in the set of vertices V2 are all the detection responses in the next frame;

3. The multi-target tracking method according to claim 1 or 2, wherein the triple network is composed of three paths, each path including a ResNet-50 convolutional neural network model and two fully-connected layers, which are sequentially connected, wherein:

4. The multi-target tracking method according to claim 1 or 2, wherein the first shallow neural network and the second shallow neural network have the same structure, and the network structures are both:

5. The multi-target tracking method according to claim 1,

for vertex v_iE V1 and vertex V_aE.g. V2, its matching degree S_iaComprises the following steps:

S_ia＝F_N([F_i,F_a])

wherein each corresponds to a vertex v_iAnd v_aFeature vector F of_i,F_a∈R^1×dD is the size of the feature vector; [. the]Representing a stitching operation that concatenates a plurality of vectors; f_NRepresenting a first shallow neural network with a softmax layer, i and j ∈ [1, n ] ]，a∈[1，m]N and m represent the total number of vertices in vertex sets V1 and V2, respectively;

S_ij:ab＝F_E([F_ij,F_ab])

6. The multi-target tracking method according to claim 5,

the elements on the diagonal of the intimacy matrix M correspond to the degree of match between the vertices, i.e. i equals j and a equals b, then M_ia:jbIs equal to vertex v_iAnd vertex v_jThe degree of matching;

the elements on the non-diagonal of the affinity matrix M represent the degree of matching between the connected edges, i.e. for i ≠ jAnd a ≠ b, then M_ia:jbEqual to the connecting edge (v)_i,v_j) And a connecting edge (v)_a,v_b) The matching degree between the two; for i ≠ j and a ≠ b or i ≠ j and a ═ b, then M_ia:jb＝0。

7. The multi-target tracking method according to claim 6, wherein the step (8) is to calculate the optimal distribution vector v of the affinity matrix M using the following formula^*：

Wherein v is₀As unit vectors, i.e. v₀1, | | · | | denotes l₂Norm, k represents iteration frequency, the value range of k is 2 to 4, and the final result of iterative computation is used as the optimal distribution vector v ^*。

8. The multi-target tracking method of claim 7, wherein step (9) is performed iteratively using the following equations to obtain the final assignment matrix

As the allocation matrix S:

9. The multi-target tracking method according to claim 8, wherein in step (12), the assignment matrix S is obtainedAll the elements in the ith row are selected as the element with the largest value, and the corresponding column number a is the vertex v 1 in the graph G1_iPreliminarily matched vertex v in graph G2_a。

10. A multi-target tracking system based on graph matching, comprising:

a second module for setting the counter cnt1 to 1;

An eighth module for obtaining the parent from the seventh modulePerforming power iteration treatment on the density matrix M to obtain an optimal distribution vector v^*；

a tenth module for setting the counter cnt2 to 1;