CN111862147A - Method for tracking multiple vehicles and multiple human targets in video - Google Patents

Method for tracking multiple vehicles and multiple human targets in video Download PDF

Info

Publication number
CN111862147A
CN111862147A CN202010496840.7A CN202010496840A CN111862147A CN 111862147 A CN111862147 A CN 111862147A CN 202010496840 A CN202010496840 A CN 202010496840A CN 111862147 A CN111862147 A CN 111862147A
Authority
CN
China
Prior art keywords
target
tracking
frame
targets
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010496840.7A
Other languages
Chinese (zh)
Other versions
CN111862147B (en
Inventor
许银翠
范圣印
熊敏
单丰武
姜筱华
陈立伟
朱祖伟
弥博文
龚朋朋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangxi Jiangling Group New Energy Automobile Co Ltd
Beijing Yihang Yuanzhi Technology Co Ltd
Original Assignee
Jiangxi Jiangling Group New Energy Automobile Co Ltd
Beijing Yihang Yuanzhi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangxi Jiangling Group New Energy Automobile Co Ltd, Beijing Yihang Yuanzhi Technology Co Ltd filed Critical Jiangxi Jiangling Group New Energy Automobile Co Ltd
Priority to CN202010496840.7A priority Critical patent/CN111862147B/en
Publication of CN111862147A publication Critical patent/CN111862147A/en
Application granted granted Critical
Publication of CN111862147B publication Critical patent/CN111862147B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/277Analysis of motion involving stochastic approaches, e.g. using Kalman filters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20024Filtering details
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

A multi-vehicle and multi-pedestrian target tracking method and device in a video are provided, wherein a target category-based multi-feature apparent modeling method is adopted, different feature extraction operators are constructed by utilizing detected target categories to complete accurate target apparent description, and accurate description of targets is realized; the fast dimensionality reduction of the incidence matrix is completed by adopting a hierarchical progressive feature extraction algorithm and a high threshold matching algorithm, so that the extraction times of the depth features are reduced, and the timeliness of calculation is ensured; the target matching is completed by using the relative orientation constraint relation between the targets to assist the apparent characteristics, so that the partial targets with similar appearances are effectively distinguished, and the mismatching rate is reduced; the recovery of the missed detection target is completed by utilizing the apparent characteristics and the motion prediction of the target, and the recovery rate of the missed detection target is greatly improved. The method has the advantages of high accuracy, strong adaptability and high detection efficiency.

Description

Method for tracking multiple vehicles and multiple human targets in video
Technical Field
The invention relates to the technical field of computer vision, in particular to a method and a device for tracking multiple vehicles and multiple pedestrian targets in a video.
Background
Video multi-target tracking is a research hotspot in the field of computer vision and is an important component of many intelligent vision systems. The main tasks of video multi-target tracking include locating target locations and maintaining target IDs. The video sequence for target tracking is obtained by projecting a 3D real world to a 2D image plane by a video acquisition device, information loss is inevitably introduced, and the quality of the video sequence is also influenced by illumination change, scene change and noise in the imaging process. Besides the problem of fuzzy appearance of the target caused by video degradation, the video multi-target tracking also faces the challenges of complex and variable background, variable posture of the target, frequent shielding among the targets, difficult distinguishing of appearance similarity and the like. Therefore, how to design a video multi-target tracking algorithm which is adaptive to the environmental complexity and meets the application diversity by fully utilizing the target appearance information and the information between the front frame and the rear frame of the video has important theoretical significance and practical application value.
The online video multi-target tracking algorithm does not depend on information of subsequent video frames, and can directly output a multi-target tracking result to the currently input video frame. The tracking algorithm expresses the association problem of the previous and next frames as the matching problem of the bipartite graph, and solves the problem by using the association algorithm such as Hungarian and the like.
In order to understand the state of the art, the present invention searches, compares and analyzes the existing patents and papers. The technical scheme with high relevance to the invention is as follows:
the technical scheme 1: a CN108932730A patent number video multi-target tracking method and system based on data association relates to a video multi-target tracking method based on data association, which is mainly completed by the following four steps: 1) calculating the similarity between each target in the current frame image and each target in the previous frame image, and establishing a similarity matrix; 2) establishing a cost matrix by taking each target in the current frame image and each target in the previous frame image as a row or a column, and initializing the elements of the cost matrix to be 0; 3) setting a similarity threshold, and finishing assignment of the cost matrix according to the comparison result of elements in the similarity matrix and the threshold; 4) and judging the condition of each target in the two frames of images according to the value of each element of the cost matrix. The method provides a video multi-target tracking method based on data association, although the appearance, disappearance, fusion and separation of the target can be simply and effectively judged according to the reality, the problem that the target is difficult to correctly match due to the fact that the similarity is too low when the target is frequently shielded is solved, and the target which is missed to be detected by a detection algorithm cannot be recovered by the method.
The technical scheme 2 is as follows: an online multi-target tracking method based on multi-feature optimal association with the patent number CN109859238A relates to an online multi-target tracking method based on multi-feature optimal association, and mainly comprises the steps of extracting apparent features of a target through a CNN network, extracting depth features of the target through a depth network, predicting motion features of the target through a Kalman filtering tracker, solving the similarity between a detection sequence set and a tracking sequence set based on the construction of a multi-feature model, constructing an association matrix through a hierarchical strategy, solving and updating the optimal association matrix, and achieving multi-target tracking. The method integrates various characteristics of the target, improves the multi-target tracking accuracy and precision under the condition of relative motion, but the tracking timeliness is difficult to guarantee once the number of the targets in the video is increased due to long time consumption of depth characteristic extraction, and the method cannot recover the target which is missed to be detected by the detection algorithm.
Technical scheme 3: the patent number CN109919981A discloses a multi-target tracking method based on kalman filter-assisted multi-feature fusion, which relates to a multi-target tracking method based on kalman filter-assisted multi-feature fusion, and the method judges the shielding situation of a target according to the coordinates of the central point of the target and the size of the target, and respectively processes the targets with different shielding degrees: 1) if the blocked part is small or no, the detector inputs the centroid coordinates of the detection frame and the preprocessed video frame into a pre-trained convolutional neural network, extracts the shallow and deep semantic information of the target, and concatenates the semantic information to form a feature matrix, and then carries out similarity estimation on the feature matrices of the two frames to obtain an optimal track; 2) and if the detected target shielding condition is serious, inputting the centroid coordinates of the detection frame into a Kalman filter, estimating the position information of the target in the next frame according to the motion state before the target, and comparing the estimated coordinate information with the actual detection result to obtain the optimal track. The method respectively processes the targets with different shielding conditions, and can solve the problem of target shielding. However, the method uses the deep neural network as the feature extractor, so that the system has high calculation consumption and low tracking efficiency. And the method directly utilizes the Hungarian algorithm to realize the matching of the targets of the front frame and the rear frame, and when the targets are similar in appearance, the problem of mismatching can be caused. In addition, as in the technical solutions 1 and 2, the method also has no strategy for dealing with target missing detection, and cannot recover the target missed detected by the detection algorithm.
The technical scheme 4 is as follows: the paper Structural constraint data association for online tracking proposes that multi-target tracking is realized by using inter-target structured position constraint auxiliary data association. The data association method of the method comprises the following 3 steps: 1) determining each possible target association combination according to the position relation of the targets of the previous and next frames; 2) for each association combination, one of the targets is sequentially selected as an anchor point, the positions of the other targets in the frame are recovered by using the other targets and the structural position constraint information of the target, the matching cost of the anchor point is calculated, and the matching costs of all the anchor points are finally fused to serve as the matching cost of the association combination; 3) and comparing the matching costs of all the association combinations, and selecting the association combination with the minimum matching cost as a final target matching result. According to the method, the position of the target is constrained and predicted by using the structured position between the targets, the problem of overall target offset caused by camera shake can be solved, but when the motion difference between the targets is large, the position of the missed detection target in the frame cannot be accurately predicted by directly using the structured topological relation of the previous frame, and the missed detection recovery rate is low. And the method only extracts the color histogram information of the target, and is difficult to cope with complicated target and background changes.
In the technical schemes 1, 2 and 3, the association between the targets is established by utilizing the apparent characteristics, the motion characteristics and the like of the targets, so that a part of multi-target tracking problems are solved, but the position association information between frames before and after the targets is ignored, and when the detection algorithm is missed, the tracking algorithm has no corresponding recovery strategy. Technical schemes 2 and 3 integrate the depth features, further improve the tracking precision, but frequently extract the depth features from the target and seriously affect the speed of the tracking algorithm. According to the technical scheme 4, the multi-target tracking is realized by utilizing the inter-target structured position constraint auxiliary data association, the problems of target overall deviation and target missing detection are solved to a certain extent, but the recovery condition of the target missing detection is not ideal when the motion difference between the targets is large; and the method only extracts the color characteristics of the target, is sensitive to illumination and has low algorithm robustness.
In automatic driving, due to the fact that application scenes are complex and changeable, the running speed of a vehicle is high, the target and the background change rapidly and inconstant, the difficulty of tracking and detecting the targets of multiple vehicles and multiple rows of people in a video is remarkably improved, and the tracking accuracy and the real-time performance of the existing method are difficult to harmonize and obtain. Therefore, a new method needs to be researched, which can ensure the tracking precision, adapt to the complicated target and background changes, and simultaneously ensure the real-time performance without increasing extra calculation overhead.
Disclosure of Invention
Aiming at the technical problems, the invention aims to design a high-accuracy and strong-adaptability multi-target tracking method to finish the efficient tracking of the pedestrian and vehicle targets in the video. Aiming at the problem that a single feature is difficult to cope with complex target and background change, the invention provides a multi-feature apparent modeling method based on a target class, and different apparent features are extracted from different classes of targets so as to improve the description capability of the features on the targets. Aiming at the problem of low timeliness of a tracking algorithm caused by time consumption of depth feature extraction, the invention provides a hierarchical progressive feature extraction algorithm, and the extraction times of depth features are reduced as far as possible on the premise of ensuring the tracking accuracy. Aiming at the problem that the apparent similarity of partial targets is difficult to distinguish, the invention provides a method for further assisting the apparent characteristics to complete the correct matching of the targets by utilizing the relative orientation relation between the targets. Aiming at the problem of missed detection caused by a detection algorithm, the invention provides a method for recovering a missed detection target by utilizing target apparent characteristics and motion prediction based on a relevant filter, and the recovery rate of the missed detection target is improved.
The invention provides a method for tracking pedestrians and vehicle targets in a video, which is characterized in that the matching of front and rear frame targets is completed by utilizing various apparent characteristics and the orientation constraint relation between the targets, a hierarchical progressive matching strategy is adopted to reduce the extraction times of the characteristics, the recovery of the undetected target is further completed based on the related filtering tracking thought, and finally, a set of method for tracking the pedestrians and the vehicle targets in the video with high accuracy and strong adaptability is formed.
To solve the above technical problem, according to an aspect of the present invention, there is provided a method for tracking multiple vehicles and multiple pedestrian targets in a video, comprising the steps of:
step 1), data acquisition: acquiring video data;
step 2), target matching: aiming at the video data, establishing an incidence matrix of a detection result and a tracking result, extracting apparent characteristics based on target categories from the detection result, and performing target matching in a layered and progressive manner;
step 3), auxiliary matching: further matching unsuccessfully matched apparent similar targets in target matching by using the orientation constraint relation between the targets;
step 4), recovering the missed detection target: recovering undetected targets of the frame by utilizing motion prediction and apparent characteristics;
step 5), outputting a tracking result: and maintaining a tracking chain, updating the orientation constraint relation between the targets, and outputting the tracking result of the current frame.
Preferably, the acquiring video data comprises acquiring video data in real time.
Preferably, the acquiring the video data includes reading the video data by a file.
Preferably, the acquiring video data comprises capturing video data with a camera mounted on the autonomous vehicle.
Preferably, the target matching further comprises:
Step 1.1), respectively establishing a vehicle target incidence matrix and a pedestrian target incidence matrix according to the target types.
Preferably, the step 1.1) of respectively establishing the vehicle target correlation matrix and the pedestrian target correlation matrix according to the target categories includes:
the current frame shown in formula (1) is used to detect the target sequence set Di
Di={d1,d2,…di…,dM-1,dM} (1)
For a row (or a column), tracking a target sequence set T by a previous frame shown in a formula (2)j
Tj={t1,t2,…tj…,tN-1,tN} (2)
An M × N correlation matrix is established for the columns (or rows), M, N being a natural number, each element A in the correlation matrixijRepresents the detection target d shown in the formula (3)i
di={typei,xi,yi,wi,hi}(i=1,2,…,M) (3)
And the tracking target t shown in the formula (4)j
Figure BDA0002523205290000051
Correlation result of (1) (initialization A)ij=1,Aij1 stands for diAnd tjAssociated, otherwise, representing unassociated), wherein typeiIs diClass, { xi,yiIs diCoordinates of center point of target frame, { wi,hiIs diThe width and height of the target frame, i.e., the size of the target frame, idj is tjID of (2) { x }j,yjIs tjCoordinates of center point of target frame, { wj,hjIs tjThe width and the height of the target frame,
Figure BDA0002523205290000052
is tjSpeed of movement of { Δ w }j,ΔhjIs tjWide and high variation of (a).
Preferably, the target matching further comprises:
and 1.2) obtaining the predicted position of each target in the current frame in the previous frame by using Kalman filtering motion prediction.
Preferably, the obtaining of the predicted position of each target in the previous frame in the current frame by using kalman filtering motion prediction comprises:
Obtaining each target t in the previous frame by Kalman filtering motion predictionj(j-1, 2, …, N) at the predicted position of the current frame
Figure BDA0002523205290000061
Is established with
Figure BDA0002523205290000062
As the center of circle, R is shown in formula (5)
Figure BDA0002523205290000063
For a circular correlation gate of radius, coordinate { x ] of center point of target framei,yiM (M is 1,2, …, M) detection targets d falling within the correlation gate1,…,dmAssociated to the tracking target tj(j=1,2, …, N), i.e. for the jth column, hold amjThe matrix is thinned by setting the remaining number to 0 (1, 2, …, M).
Preferably, the target matching further comprises:
step 1.3), calculating to satisfy AijTarget association pair A not equal to 0ijCorresponding detection target diAnd tracking the target tjTarget frame similarity FsDegree of overlap with target frame FiouObtaining the overall similarity F of the target frame shown in the formula (6)box
Fbox=λboxFs+(1–λbox)Fiou, (6)
Wherein λ isboxAs the target frame similarity FsGlobal similarity in object box FboxThe weight in (1);
updating each correlation pair in the correlation matrix as shown in equation (7)
Figure BDA0002523205290000064
TboxIs the overall similarity threshold of the target frame; for satisfying AijAssociation pair A of 1ijStatistics of all A in ith row and jth columnijNumber of 1
Figure BDA0002523205290000065
If it is
Figure BDA0002523205290000066
Indicating the detection target diAnd tracking the target tjHaving been successfully matched, using the detection target diTarget frame center point coordinates { x }i,yiAnd size wi,hiUpdating the tracking target tjCalculating and saving the target t jThe ith row and the jth column of the correlation matrix are deleted.
Preferably, the similarity F of the target frame in the step 1.3)SThe calculation method of (2) is shown in equation (8):
Figure BDA0002523205290000067
preferably, the target frame overlapping degree F in the step 1.3)iouThe calculation method of (2) is shown in equation (9):
Figure BDA0002523205290000068
wherein, area (d)i) Indicating a detected object diThe area of the target frame of (1); area (t)j) Representing a tracked object tjArea of the target frame of (1).
Preferably, the target matching further comprises:
step 1.4), detecting target d in incidence matrixi(i ═ 1,2, …, m) computing the apparent feature a1Fusing the overall similarity F of the target frameboxDegree of similarity to apparenta1As the integrated similarity F shown in the formula (10)c1
Fc1=λ1Fbox+(1–λ1)Fa1, (10)
Wherein λ is1Is the overall similarity F of the target frameboxAt the integrated similarity Fc1The weight in (1).
Preferably, each relevant pair in the correlation matrix is updated as shown in equation (11)
Figure BDA0002523205290000071
Tc1Is a comprehensive similarity threshold; for satisfying AijAssociation pair A of 1ijStatistics of all A in ith row and jth columnijNumber of 1
Figure BDA0002523205290000072
If it is
Figure BDA0002523205290000073
Indicating the detection target diAnd tracking the target tjHaving been successfully matched, using the detection target diTarget frame center point coordinates { x }i,yiAnd size wi,hiUpdating the tracking target tjCalculating and saving the target tjThe ith row and the jth column of the incidence matrix are deleted, and the apparent characteristic a of the target is updated 1
Preferably, the apparent characteristic a1And the calculation method of the similarity measurement is as follows:
the gradient feature is an important apparent feature for pedestrian and vehicle targets, and the apparent feature a of the invention1Selected as a Histogram of Oriented Gradient (HOG) feature that describes the stronger power. Cutting an image slice in the frame image detection target frame, inputting the image slice into an HOG feature extractor to obtain the HOG feature of the detection target, performing point multiplication operation on the HOG feature of the previous frame tracking target to obtain a feature response image, representing the HOG feature similarity of the detection target and the tracking target by the maximum pixel value in the feature response image, and normalizing to obtain an apparent similarity Fa1
Preferably, the target matching further comprises:
step 1.5), calculating the apparent feature a2 of the detection targets di (i is 1,2, …, m) in the incidence matrix, wherein m is the total number of the detection targets, and fusing the comprehensive similarity Fc1 and the apparent similarity Fa2 as the comprehensive similarity F shown in the formula (12)c2
Fc2=λ2Fc1+(1–λ2)Fa2, (12)
Wherein λ is2To synthesize the similarity Fc1At the integrated similarity Fc2The weight in (1).
Preferably, each relevant pair in the correlation matrix is updated as shown in equation (13)
Figure BDA0002523205290000081
Tc2Is a comprehensive similarity threshold; for satisfying AijAssociation pair A of 1ijStatistics of all A in ith row and jth column ijNumber of 1
Figure BDA0002523205290000082
If it is
Figure BDA0002523205290000083
Indicating the detection target diAnd tracking the target tjHaving been successfully matched, using the detection target diTarget frame center point coordinates { x }i,yiAnd size wi,hiUpdating the tracking target tjCalculating and saving the target tjThe ith row and the jth column of the incidence matrix are deleted, and the apparent characteristic a of the target is updated1And a2
Preferably, the apparent characteristic a2And the calculation method of the similarity measurement is as follows:
due to the inter-class difference of the two classes of targets of the pedestrian and the vehicle, the invention respectively extracts different apparent characteristics a of the vehicle target and the pedestrian target based on the target class2Preferably, the apparent characteristic a of the present invention2Selected as the depth feature learned by the Resnet18 deep neural network. And respectively training the difference between the same different targets of the vehicle target and the pedestrian target and the characteristic that the same target does not change along with time by utilizing the Resnet18 deep neural network. Cutting an image slice in a frame image detection target frame, inputting the image slice into a depth feature extractor to obtain a depth feature vector of a detection target, calculating the Euclidean distance between the depth feature vector and the depth feature vector of a previous frame tracking target, and normalizing to obtain an apparent similarity Fa2
Preferably, the underlying network of the depth feature extractor is Resnet18 with the last fully connected classification level removed for extracting global features of the target.
Preferably, the pedestrian target feature is horizontally cut into two parts of the upper and lower body to obtain the local feature.
Preferably, the vehicle target feature is horizontally cut into an upper half part and a lower half part and is vertically cut into a left half part and a right half part respectively to obtain a local feature, and then the global feature and the local feature are subjected to global maximum pooling and global mean pooling, wherein the global maximum pooling is used for acquiring the saliency characteristic of the target, and the global mean pooling is used for the background information and the outline information of the image slice.
Preferably, 2048-dimensional features are obtained by adding global features subjected to global maximum pooling and global mean pooling, dimension reduction by one-dimensional convolution is performed to 256-dimensional features, 1024-dimensional features are obtained by adding local features subjected to global maximum pooling and global mean pooling, and dimension reduction by one-dimensional convolution is performed to 256-dimensional features.
Preferably, the dimensionality-reduced features are spliced to form a depth feature vector of 3 × 256 to 768 dimensions for the pedestrian target.
Preferably, the reduced-dimension features are spliced to form a depth feature vector of 5 × 256 to 1280 dimensions for the vehicle target.
Preferably, the auxiliary matching further comprises:
the apparent similar objects are further matched using inter-object orientation constraints.
Preferably, said further matching the apparently similar objects using the inter-object orientation constraint comprises:
for N tracking targets t in a video sequencej(j ═ 1,2, …, N), each object encoding and maintaining relative bearing relationships with the other N-1 objects
RPj={rp1,…rpj-1,rpj+1…,rpN}, (14)
Wherein rpj-1Representing the target tjAnd a target tj-1Relative orientation of (3).
Preferably, the relevance pairs in the relevance matrix, T, are updated according to equation (15)aAnd (4) screening out the associated pairs with lower apparent similarity as an associated threshold value.
Figure BDA0002523205290000091
Preferably, A in the correlation matrixijAnd (3) regarding the targets which are not equal to 0 as apparent similar targets, coding the relative orientations of the n apparent similar targets, calculating an orientation matching cost Crp, and taking the matching result of the minimum orientation matching cost as a final matching result.
Preferably, the calculation method of the orientation matching cost Crp is as shown in formula (17):
Figure BDA0002523205290000092
wherein, CrpjIs a target tjThe orientation matching cost with other n-1 apparently similar targets is calculated as shown in equation (18):
Figure BDA0002523205290000093
wherein, RPjRepresenting a preceding frame tracking target tjRelative orientation relation with other N-1 targets;
Figure BDA0002523205290000094
showing the relative orientation relation of the tracking target tj of the frame and other N-1 targets.
Preferably, for unmatched tracked targets t satisfying equation (19) in the correlation matrix jAnd combines the motion prediction and the apparent characteristic to recover the missed detection target,
Figure BDA0002523205290000101
preferably, for the calculated target tj(j ═ 1,2, …, n) at the predicted position of the current frame
Figure BDA0002523205290000102
And further recovering the missed detection target by combining the apparent characteristics.
Preferably, with a target tjPredicted position of
Figure BDA0002523205290000103
Is the center point, as shown in equation (20)
Figure BDA0002523205290000104
Is wide, and is shown in formula (21)
Figure BDA0002523205290000105
Candidate boxes are determined to be high.
Preferably, image slices in the candidate frame of the frame image are cut, the cut image slices are input into an HOG feature extractor to obtain HOG features of the target in the candidate frame, point multiplication is carried out on the HOG features and the HOG features which are not matched with the tracking target to obtain a feature response graph, the maximum pixel value in the feature response graph represents the HOG feature similarity of the detection target and the tracking target, and the HOG feature similarity is compared with an HOG feature similarity threshold value to determine the target state.
Preferably, if the HOG feature similarity is greater than the HOG feature similarity threshold, the coordinate position of the maximum pixel value in the feature response map is mapped to the coordinate position in the original map
Figure BDA0002523205290000106
By the coordinate position
Figure BDA0002523205290000107
And size wj+Δwj,hj+ΔhjUpdating the tracking target tjSimultaneously updating the apparent feature a of the target1And a2
Preferably, if the HOG feature similarity is smaller than the HOG feature similarity threshold, the tracking target t is considered to bejIs blocked or disappears, and does not update the target t j
Preferably, processing the tracking chain of the unmatched tracking target and the unmatched detection target, updating the orientation constraint relation between the targets to prepare for the next frame of tracking, and outputting the target tracking result of the current frame.
Preferably, tracking target t lower than HOG feature similarity threshold valueiAnd marking the target to be confirmed, adding the target to the incidence matrix of the next frame, updating the target if the target is successfully matched in the next frame, otherwise, considering that the target is shielded, increasing the continuous unmatched times, considering that the target disappears and removing the target from the incidence matrix if the continuous unmatched times reach the continuous unmatched threshold.
Preferably, the formula (22) will be satisfied
Figure BDA0002523205290000108
Undetected target diAnd marking as a target to be initialized, adding the target to the incidence matrix of the next frame, initializing the target to be a new target if the target is successfully matched in the next frame, otherwise, considering the target as a false alarm target, and removing the target from the incidence matrix.
Preferably, the orientation constraint relation between the targets is updated, and the tracking result of the current frame is output.
In order to solve the above technical problem, according to another aspect of the present invention, there is provided an apparatus for tracking multiple vehicles and multiple human targets in a video, comprising:
a data acquisition device: for acquiring video data;
a target matching device: aiming at the video data, establishing an incidence matrix of a detection result and a tracking result, extracting apparent characteristics based on target categories from the detection result, and performing target matching in a layered and progressive manner;
Auxiliary matching device: further matching unsuccessfully matched apparent similar targets in target matching by using the orientation constraint relation between the targets;
missed detection target recovery means: recovering undetected targets of the frame by utilizing motion prediction and apparent characteristics;
a tracking result output device: and maintaining a tracking chain, updating the orientation constraint relation between the targets, and outputting the tracking result of the current frame.
Preferably, the acquiring video data comprises acquiring video data in real time.
Preferably, the acquiring the video data includes reading the video data by a file.
Preferably, the acquiring video data comprises capturing video data with a camera mounted on the autonomous vehicle.
Preferably, the target matching further comprises:
and respectively establishing a vehicle target incidence matrix and a pedestrian target incidence matrix according to the target types.
Preferably, the respectively establishing a vehicle target correlation matrix and a pedestrian target correlation matrix according to the target categories includes:
the current frame shown in formula (1) is used to detect the target sequence set Di
Di={d1,d2,…di…,dM-1,dM} (1)
For a row (or a column), tracking a target sequence set T by a previous frame shown in a formula (2)j
Tj={t1,t2,…tj…,tN-1,tN} (2)
An M × N correlation matrix is established for the columns (or rows), M, N being a natural number, each element A in the correlation matrix ijRepresents the detection target d shown in the formula (3)i
di={typei,xi,yi,wi,hi}(i=1,2,…,M) (3)
And the tracking target t shown in the formula (4)j
Figure BDA0002523205290000121
Correlation result of (1) (initialization A)ij=1,Aij1 stands for diAnd tjAssociated, otherwise, representing unassociated), wherein typeiIs diClass, { xi,yiIs diCoordinates of center point of target frame, { wi,hiIs diThe width and height of the target frame, i.e., the size of the target frame, idj is tjID of (2) { x }j,yjIs tjCoordinates of center point of target frame, { wj,hjIs tjThe width and the height of the target frame,
Figure BDA0002523205290000122
is tjSpeed of movement of { Δ w }j,ΔhjIs tjWide and high variation of (a).
Preferably, it is characterized in that the first and second parts,
the target matching further comprises:
and obtaining the predicted position of each target in the previous frame in the current frame by using Kalman filtering motion prediction.
Preferably, the obtaining of the predicted position of each target in the previous frame in the current frame by using kalman filtering motion prediction comprises:
obtaining each target t in the previous frame by Kalman filtering motion predictionj(j-1, 2, …, N) at the predicted position of the current frame
Figure BDA0002523205290000123
Is established with
Figure BDA0002523205290000124
As the center of circle, R is shown in formula (5)
Figure BDA0002523205290000125
For a circular correlation gate of radius, coordinate { x ] of center point of target framei,yiM (M is 1,2, …, M) detection targets d falling within the correlation gate1,…,dmAssociated to the tracking target tj(j-1, 2, …, N), i.e. for the j-th column, a is heldmjThe matrix is thinned by setting the remaining number to 0 (1, 2, …, M).
Preferably, the target matching further comprises:
calculating to satisfy AijTarget association pair A not equal to 0ijCorresponding detection target diAnd tracking the target tjTarget frame similarity FsDegree of overlap with target frame FiouObtaining the overall similarity F of the target frame shown in the formula (6)box
Fbox=λboxFs+(1–λbox)Fiou, (6)
Wherein λ isboxAs the target frame similarity FsGlobal similarity in object box FboxThe weight in (1);
updating each correlation pair in the correlation matrix as shown in equation (7)
Figure BDA0002523205290000131
TboxIs the overall similarity threshold of the target frame; for satisfying AijAssociation pair A of 1ijStatistics of all A in ith row and jth columnijNumber of 1
Figure BDA0002523205290000132
If it is
Figure BDA0002523205290000133
Indicating the detection target diAnd tracking the target tjHaving been successfully matched, using the detection target diTarget frame center point coordinates { x }i,yiAnd size wi,hiUpdating the tracking target tjCalculating and saving the target tjThe ith row and the jth column of the correlation matrix are deleted.
Preferably, the similarity F of the target frame in the step 1.3)SThe calculation method of (2) is shown in equation (8):
Figure BDA0002523205290000134
preferably, the target frame overlapping degree F in the step 1.3)iouThe calculation method of (2) is shown in equation (9):
Figure BDA0002523205290000135
wherein, area (d)i) Indicating a detected object diThe area of the target frame of (1); area (t)j) Representing a tracked object tjArea of the target frame of (1).
Preferably, the target matching further comprises:
For the detected target d in the incidence matrixi(i ═ 1,2, …, m) computing the apparent feature a1Fusing the overall similarity F of the target frameboxDegree of similarity to apparenta1AsThe integrated similarity F as shown in the formula (10)c1
Fc1=λ1Fbox+(1–λ1)Fa1, (10)
Wherein λ is1Is the overall similarity F of the target frameboxAt the integrated similarity Fc1The weight in (1).
Preferably, each relevant pair in the correlation matrix is updated as shown in equation (11)
Figure BDA0002523205290000136
Tc1Is a comprehensive similarity threshold; for satisfying AijAssociation pair A of 1ijStatistics of all A in ith row and jth columnijNumber of 1
Figure BDA0002523205290000141
If it is
Figure BDA0002523205290000142
Indicating the detection target diAnd tracking the target tjHaving been successfully matched, using the detection target diTarget frame center point coordinates { x }i,yiAnd size wi,hiUpdating the tracking target tjCalculating and saving the target tjThe ith row and the jth column of the incidence matrix are deleted, and the apparent characteristic a of the target is updated1
Preferably, the apparent characteristic a1And the calculation method of the similarity measurement is as follows:
the gradient feature is an important apparent feature for pedestrian and vehicle targets, and the apparent feature a of the invention1Selected as a Histogram of Oriented Gradient (HOG) feature that describes the stronger power. Cutting image slices in the frame image detection target frame, inputting the image slices into an HOG feature extractor to obtain HOG features of a detection target, performing point multiplication operation on the HOG features of a previous frame tracking target to obtain a feature response image, wherein the maximum pixel value in the feature response image represents the HOG features of the detection target and the tracking target The similarity is characterized and normalized to obtain the apparent similarity Fa1
Preferably, the target matching further comprises:
calculating the apparent feature a2 of the detection targets di (i is 1,2, …, m) in the incidence matrix, wherein m is the total number of the detection targets, and fusing the comprehensive similarity Fc1 and the apparent similarity Fa2 as the comprehensive similarity F shown in the formula (12)c2
Fc2=λ2Fc1+(1–λ2)Fa2, (12)
Wherein λ is2To synthesize the similarity Fc1At the integrated similarity Fc2The weight in (1).
Preferably, each relevant pair in the correlation matrix is updated as shown in equation (13)
Figure BDA0002523205290000143
Tc2Is a comprehensive similarity threshold; for satisfying AijAssociation pair A of 1ijStatistics of all A in ith row and jth columnijNumber of 1
Figure BDA0002523205290000144
If it is
Figure BDA0002523205290000145
Indicating the detection target diAnd tracking the target tjHaving been successfully matched, using the detection target diTarget frame center point coordinates { x }i,yiAnd size wi,hiUpdating the tracking target tjCalculating and saving the target tjThe ith row and the jth column of the incidence matrix are deleted, and the apparent characteristic a of the target is updated1And a2
Preferably, the apparent characteristic a2And the calculation method of the similarity measurement is as follows:
because of the inter-class difference of the two classes of targets, the invention extracts different tables for the vehicle target and the pedestrian target respectively based on the target classAspect characteristics a2Preferably, the apparent characteristic a of the present invention 2Selected as the depth feature learned by the Resnet18 deep neural network. And respectively training the difference between the same different targets of the vehicle target and the pedestrian target and the characteristic that the same target does not change along with time by utilizing the Resnet18 deep neural network. Cutting an image slice in a frame image detection target frame, inputting the image slice into a depth feature extractor to obtain a depth feature vector of a detection target, calculating the Euclidean distance between the depth feature vector and the depth feature vector of a previous frame tracking target, and normalizing to obtain an apparent similarity Fa2
Preferably, the underlying network of the depth feature extractor is Resnet18 with the last fully connected classification level removed for extracting global features of the target.
Preferably, the pedestrian target feature is horizontally cut into two parts of the upper and lower body to obtain the local feature.
Preferably, the vehicle target feature is horizontally cut into an upper half part and a lower half part and is vertically cut into a left half part and a right half part respectively to obtain a local feature, and then the global feature and the local feature are subjected to global maximum pooling and global mean pooling, wherein the global maximum pooling is used for acquiring the saliency characteristic of the target, and the global mean pooling is used for the background information and the outline information of the image slice.
Preferably, 2048-dimensional features are obtained by adding global features subjected to global maximum pooling and global mean pooling, dimension reduction by one-dimensional convolution is performed to 256-dimensional features, 1024-dimensional features are obtained by adding local features subjected to global maximum pooling and global mean pooling, and dimension reduction by one-dimensional convolution is performed to 256-dimensional features.
Preferably, the dimensionality-reduced features are spliced to form a depth feature vector of 3 × 256 to 768 dimensions for the pedestrian target.
Preferably, the reduced-dimension features are spliced to form a depth feature vector of 5 × 256 to 1280 dimensions for the vehicle target.
Preferably, the auxiliary matching further comprises:
the apparent similar objects are further matched using inter-object orientation constraints.
Preferably, said further matching the apparently similar objects using the inter-object orientation constraint comprises:
for N tracking targets t in a video sequencej(j ═ 1,2, …, N), each object encoding and maintaining relative bearing relationships with the other N-1 objects
RPj={rp1,…rpj-1,rpj+1…,rpN}, (14)
Wherein rpj-1Representing the target tjAnd a target tj-1Relative orientation of (3).
Preferably, the relevance pairs in the relevance matrix, T, are updated according to equation (15)aAnd (4) screening out the associated pairs with lower apparent similarity as an associated threshold value.
Figure BDA0002523205290000161
Preferably, A in the correlation matrixijAnd (3) regarding the targets which are not equal to 0 as apparent similar targets, coding the relative orientations of the n apparent similar targets, calculating an orientation matching cost Crp, and taking the matching result of the minimum orientation matching cost as a final matching result.
Preferably, the calculation method of the orientation matching cost Crp is as shown in formula (17):
Figure BDA0002523205290000162
Wherein, CrpjIs a target tjThe orientation matching cost with other n-1 apparently similar targets is calculated as shown in equation (18):
Figure BDA0002523205290000163
wherein, RPjRepresenting a preceding frame tracking target tjRelative orientation relation with other N-1 targets;
Figure BDA0002523205290000164
representing the tracking target tj of the frame with N-1 other targetsRelative orientation relationship.
Preferably, for unmatched tracked targets t satisfying equation (19) in the correlation matrixjAnd combines the motion prediction and the apparent characteristic to recover the missed detection target,
Figure BDA0002523205290000165
preferably, for the calculated target tj(j ═ 1,2, …, n) at the predicted position of the current frame
Figure BDA0002523205290000166
And further recovering the missed detection target by combining the apparent characteristics.
Preferably, with a target tjPredicted position of
Figure BDA0002523205290000167
Is the center point, as shown in equation (20)
Figure BDA0002523205290000168
Is wide, and is shown in formula (21)
Figure BDA0002523205290000169
Candidate boxes are determined to be high.
Preferably, image slices in the candidate frame of the frame image are cut, the cut image slices are input into an HOG feature extractor to obtain HOG features of the target in the candidate frame, point multiplication is carried out on the HOG features and the HOG features which are not matched with the tracking target to obtain a feature response graph, the maximum pixel value in the feature response graph represents the HOG feature similarity of the detection target and the tracking target, and the HOG feature similarity is compared with an HOG feature similarity threshold value to determine the target state.
Preferably, if the HOG feature similarity is greater than the HOG feature similarity threshold, the coordinate position of the maximum pixel value in the feature response map is mapped to the coordinate position in the original map
Figure BDA0002523205290000171
By the coordinate position
Figure BDA0002523205290000172
And size wj+Δwj,hj+ΔhjUpdating the tracking target tjSimultaneously updating the apparent feature a of the target1And a2
Preferably, if the HOG feature similarity is smaller than the HOG feature similarity threshold, the tracking target t is considered to bejIs blocked or disappears, and does not update the target tj
Preferably, processing the tracking chain of the unmatched tracking target and the unmatched detection target, updating the orientation constraint relation between the targets to prepare for the next frame of tracking, and outputting the target tracking result of the current frame.
Preferably, tracking target t lower than HOG feature similarity threshold valueiAnd marking the target to be confirmed, adding the target to the incidence matrix of the next frame, updating the target if the target is successfully matched in the next frame, otherwise, considering that the target is shielded, increasing the continuous unmatched times, considering that the target disappears and removing the target from the incidence matrix if the continuous unmatched times reach the continuous unmatched threshold.
Preferably, the formula (22) will be satisfied
Figure BDA0002523205290000173
Undetected target diAnd marking as a target to be initialized, adding the target to the incidence matrix of the next frame, initializing the target to be a new target if the target is successfully matched in the next frame, otherwise, considering the target as a false alarm target, and removing the target from the incidence matrix.
Preferably, the orientation constraint relation between the targets is updated, and the tracking result of the current frame is output.
The invention has the beneficial effects that:
1. The invention provides a multi-feature apparent modeling method based on a target category, different apparent description methods are provided based on the target category, changes of a target and a background are fully considered, different feature extraction operators are constructed by utilizing the target category obtained by detection to complete accurate target apparent description, accurate description of the target is realized, and complicated target and background changes can be dealt with;
2. the invention provides a hierarchical progressive data association method, which is characterized in that an association matrix is established according to a target category, a high-threshold matching algorithm is utilized to complete rapid dimensionality reduction on the association matrix, the extraction times of depth features are reduced, the time efficiency of the algorithm is improved, and the tracking speed is improved while the tracking precision is ensured;
3. the invention provides a method for completing target matching by utilizing the relative orientation constraint relation between targets to assist the apparent characteristics, and the method can be used for completing correct matching of the apparent similar targets by assisting the apparent characteristics of the current frame through coding the relative orientation of the targets of the previous frame, thereby effectively reducing the mismatching rate.
4. The invention provides a method for recovering a missed detection target based on a correlation filter by utilizing target apparent characteristics and motion prediction under a tracking frame of the correlation filter, thereby greatly improving the recovery rate of the missed detection target.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the principles of the invention. The above and other objects, features and advantages of the present invention will become more apparent from the detailed description of the embodiments of the present invention when taken in conjunction with the accompanying drawings.
FIG. 1 is an overall frame diagram of the present invention;
FIG. 2 is a flow chart of a hierarchical progressive data association algorithm;
FIG. 3 is a relative orientation encoding diagram;
FIG. 4 is a graph of a matched set of two apparently similar targets.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and embodiments. It is to be understood that the specific embodiments described herein are for purposes of illustration only and are not to be construed as limitations of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.
In addition, the embodiments of the present invention and the features of the embodiments may be combined with each other without conflict. The present invention will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
The invention aims to provide a method for tracking pedestrian and vehicle targets in a video. Fig. 1 depicts the overall framework of the invention, comprising the following steps:
firstly, establishing an incidence matrix of a detection result and a tracking result, extracting apparent characteristics based on target categories from the detection result, and performing target matching in a layered and progressive manner;
secondly, further matching the unsuccessfully matched apparent similar targets in the first step by using the orientation constraint relation between the targets;
thirdly, recovering undetected targets of the frame by utilizing motion prediction and apparent characteristics;
and fourthly, maintaining a tracking chain, updating the orientation constraint relation between the targets and outputting the tracking result of the frame.
(1) Hierarchical progressive data association completion target matching
The data association is a process of associating the uncertainty detection result with the existing target track so as to complete the task of matching the targets of the previous frame and the next frame. Most of the existing schemes directly extract various features for each target, for each association pair, multiple feature similarities are fused to serve as matching costs, an exhaustive method is used for calculating the matching costs for each possible global association, and the global association with the lowest matching cost serves as a target matching result. It can be seen that the existing scheme makes a "hard" decision on data association, does not fully consider the possibility that the target is a false match, and requires that the matching cost is designed to be very accurate. Aiming at the technical problem, the invention provides the method for establishing the incidence matrixes respectively by using the object types, reducing the redundant object similarity calculation times, reducing the feature extraction times by using the layered progressive thought, ensuring the algorithm precision and improving the time efficiency of the algorithm. FIG. 2 is a flow chart of a hierarchical progressive data association algorithm.
(1.1) respectively establishing vehicle target association according to target typesMatrix and pedestrian target correlation matrix, and detecting target sequence set D in the frame shown in formula (1)i
Di={d1,d2,…di…,dM-1,dM} (1)
For a row (or a column), tracking a target sequence set T by a previous frame shown in a formula (2)j
Tj={t1,t2,…tj…,tN-1,tN} (2)
An M × N correlation matrix is established for the columns (or rows), M, N being a natural number, each element A in the correlation matrixijRepresents the detection target d shown in the formula (3)i
di={typei,xi,yi,wi,hi}(i=1,2,…,M) (3)
And the tracking target t shown in the formula (4)j
Figure BDA0002523205290000191
Correlation result of (1) (initialization A)ij=1,Aij1 stands for diAnd tjAssociated, otherwise, representing unassociated), wherein typeiIs diClass, { xi,yiIs diCoordinates of center point of target frame, { wi,hiIs diThe width and height of the target frame, i.e., the size of the target frame, idj is tjID of (2) { x }j,yjIs tjCoordinates of center point of target frame, { wj,hjIs tjThe width and the height of the target frame,
Figure BDA0002523205290000201
is tjSpeed of movement of { Δ w }j,ΔhjIs tjWide and high variation of (a).
(1.2) obtaining each target t in the previous frame by using Kalman filtering motion predictionj(j ═ 1,2, …, N) at the predicted position in this frame
Figure BDA0002523205290000202
Is established with
Figure BDA0002523205290000203
As the center of circle, R is shown in formula (5)
Figure BDA0002523205290000204
For a circular correlation gate of radius, coordinate { x ] of center point of target framei,yiM (M is 1,2, …, M) detection targets d falling within the correlation gate1,…,dmAssociated to the tracking target tj(j-1, 2, …, N), i.e. for the j-th column, a is held mjThe matrix is thinned by setting the remaining number to 0 (1, 2, …, M).
(1.3) calculation step (1.2) satisfying AijTarget association pair A not equal to 0ijCorresponding detection target diAnd tracking the target tjTarget frame similarity FsDegree of overlap with target frame FiouObtaining the overall similarity F of the target frame shown in the formula (6)box
Fbox=λboxFs+(1–λbox)Fiou, (6)
Wherein λ isboxAs the target frame similarity FsGlobal similarity in object box FboxThe weight in (1);
updating each correlation pair in the correlation matrix as shown in equation (7)
Figure BDA0002523205290000205
TboxIs the target box overall similarity threshold. For satisfying AijAssociation pair A of 1ijStatistics of all A in ith row and jth columnijNumber of 1
Figure BDA0002523205290000206
If it is
Figure BDA0002523205290000207
Indicating the detection target diAnd tracking the target tjHaving been successfully matched, using the detection target diTarget frame center point coordinates { x }i,yiAnd size wi,hiUpdating the tracking target tjCalculating and saving the target tjThe ith row and the jth column of the correlation matrix are deleted.
Method as described above, target frame similarity F in step (1.3)SThe calculation method of (2) is shown in equation (8):
Figure BDA0002523205290000211
the method as described above, step (1.3) target frame overlap degree FiouThe calculation method of (2) is shown in equation (9):
Figure BDA0002523205290000212
wherein, area (d)i) Indicating a detected object diThe area of the target frame of (1); area (t)j) Representing a tracked object t jArea of the target frame of (1).
(1.4) to the detection target d in the incidence matrixi(i ═ 1,2, …, m) computing the apparent feature a1Fusing the overall similarity F of the target frameboxDegree of similarity to apparenta1As the integrated similarity F shown in the formula (10)c1
Fc1=λ1Fbox+(1–λ1)Fa1, (10)
Wherein λ is1Is the overall similarity F of the target frameboxAt the integrated similarity Fc1The weight in (1).
Updating each correlation pair in the correlation matrix as shown in equation (11)
Figure BDA0002523205290000213
Tc1For healdAnd (4) combining the similarity threshold values. For satisfying AijAssociation pair A of 1ijStatistics of all A in ith row and jth columnijNumber of 1
Figure BDA0002523205290000214
If it is
Figure BDA0002523205290000215
Indicating the detection target diAnd tracking the target tjHaving been successfully matched, using the detection target diTarget frame center point coordinates { x }i,yiAnd size wi,hiUpdating the tracking target tjCalculating and saving the target tjThe ith row and the jth column of the incidence matrix are deleted, and the apparent characteristic a of the target is updated1
The method as described above, step (1.4) the appearance characteristic a1And the calculation method of the similarity measurement is as follows:
the gradient feature is an important apparent feature for pedestrian and vehicle targets, and the apparent feature a of the invention1Selected as a Histogram of Oriented Gradient (HOG) feature that describes the stronger power. Cutting an image slice in the frame image detection target frame, inputting the image slice into an HOG feature extractor to obtain the HOG feature of the detection target, performing point multiplication operation on the HOG feature of the previous frame tracking target to obtain a feature response image, representing the HOG feature similarity of the detection target and the tracking target by the maximum pixel value in the feature response image, and normalizing to obtain an apparent similarity F a1
(1.5) for the detection targets di (i is 1,2, …, m) in the incidence matrix, m is the total number of the detection targets, the apparent feature a2 is calculated, and the integrated similarity Fc1 and the apparent similarity Fa2 are fused to be used as the integrated similarity F shown in the formula (12)c2
Fc2=λ2Fc1+(1–λ2)Fa2, (12)
Wherein λ is2To synthesize the similarity Fc1At the integrated similarity Fc2The weight in (1).
Updating each relevant pair in the correlation matrix as shown in equation (13)
Figure BDA0002523205290000221
Tc2Is the integrated similarity threshold. For satisfying AijAssociation pair A of 1ijStatistics of all A in ith row and jth columnijNumber of 1
Figure BDA0002523205290000222
If it is
Figure BDA0002523205290000223
Indicating the detection target diAnd tracking the target tjHaving been successfully matched, using the detection target diTarget frame center point coordinates { x }i,yiAnd size wi,hiUpdating the tracking target tjCalculating and saving the target tjThe ith row and the jth column of the incidence matrix are deleted, and the apparent characteristic a of the target is updated1And a2
The method as described above, step (1.5) the appearance characteristic a2And the calculation method of the similarity measurement is as follows:
due to the inter-class difference of the two classes of targets of the pedestrian and the vehicle, the invention respectively extracts different apparent characteristics a of the vehicle target and the pedestrian target based on the target class2Preferably, the apparent characteristic a of the present invention2Selected as the depth feature learned by the Resnet18 deep neural network. And respectively training the difference between the same different targets of the vehicle target and the pedestrian target and the characteristic that the same target does not change along with time by utilizing the Resnet18 deep neural network. Cutting an image slice in a frame image detection target frame, inputting the image slice into a depth feature extractor to obtain a depth feature vector of a detection target, calculating the Euclidean distance between the depth feature vector and the depth feature vector of a previous frame tracking target, and normalizing to obtain an apparent similarity F a2
Specifically, the underlying network of the depth feature extractor is Resnet18 with the last fully connected classification level removed for extracting global features of the target. The upper part and the lower part of the pedestrian target are obvious in characteristics, so that the characteristic of the pedestrian target is horizontally cut into the upper part and the lower part to form local characteristics. The upper part and the lower part of the vehicle target are obvious in characteristics, and the left part and the right part are easy to be frequently shielded, so that the vehicle target characteristics are respectively horizontally cut into the upper half part and the lower half part, and are vertically cut into the left half part and the right half part to obtain local characteristics, then the global characteristics and the local characteristics are subjected to global maximum pooling and global mean pooling, the global maximum pooling is used for obtaining the significance characteristics of the target, and the global mean pooling is used for background information and contour information of the image slice. Adding the global features subjected to global maximum pooling and global mean pooling to obtain 2048-dimensional features, performing one-dimensional convolution to reduce the dimensions to 256 dimensions, adding the local features subjected to global maximum pooling and global mean pooling to obtain 1024-dimensional features, and performing one-dimensional convolution to reduce the dimensions to 256 dimensions. And splicing the dimensionality reduced features, forming a depth feature vector with 3 x 256 to 768 dimensions for a pedestrian target, and forming a depth feature vector with 5 x 256 to 1280 dimensions for a vehicle target.
(2) Further matching of apparently similar objects using inter-object orientation constraints
For N tracking targets t in a video sequencej(j ═ 1,2, …, N), each object encoding and maintaining relative bearing relationships with the other N-1 objects
RPj={rp1,…rpj-1,rpj+1…,rpN}, (14)
Wherein rpj-1Representing the target tjAnd a target tj-1Relative orientation of (3).
Wherein rp isj={rc},
Figure BDA0002523205290000231
The specific coding mode is shown in fig. 3.
(2.1) according to
Figure BDA0002523205290000232
Updating the relevance pairs, T, in the relevance matrixaAnd (4) screening out the associated pairs with lower apparent similarity as an associated threshold value.
(2.2) A in the correlation matrixijAnd (3) regarding the targets which are not equal to 0 as apparent similar targets, coding the relative orientations of the n apparent similar targets, calculating an orientation matching cost Crp, and taking the matching result of the minimum orientation matching cost as a final matching result.
As with the method described above, the method of encoding the relative orientation in step (2.2) is shown in fig. 3.
FIG. 3(a) is a relative orientation relation code table, in which 9 target positions in FIG. 3(b) are taken as an example, the relative orientation relation code of target 1 and target 2 is {00}, and the relative orientation relation with target 3 is {00}
Figure BDA0002523205290000241
The relative orientation with respect to the target 4 is {01}, and the relative orientation with respect to the target 5 is { 5 }
Figure BDA0002523205290000242
The relative orientation with respect to the target 6 is {11}, and the relative orientation with respect to the target 7 is {11}
Figure BDA0002523205290000243
The relative orientation with respect to the target 8 is {10}, and the relative orientation with respect to the target 9 is {10}
Figure BDA0002523205290000244
Therefore, the temperature of the molten metal is controlled,
Figure BDA0002523205290000245
as described above, the method for calculating the azimuth matching cost Crp in step (2.2) is shown in equation (17):
Figure BDA0002523205290000246
wherein, CrpjIs a target tjThe orientation matching cost with other n-1 apparent similar targets is calculated as shown in formula (18)The following steps:
Figure BDA0002523205290000247
wherein, RPjRepresenting a preceding frame tracking target tjRelative orientation relation with other N-1 targets;
Figure BDA0002523205290000248
showing the relative orientation relation of the tracking target tj of the frame and other N-1 targets.
For example, when target t1And a target t2When apparent similar, the tabular form of the correlation matrix is shown in table 1:
TABLE 1 correlation matrix of two apparently similar objects
Figure BDA0002523205290000249
Previous frame target t1And a target t2The positional relationship of (A) is as shown in FIG. 4(a), and FIGS. 4(b) and 4(c) are detection targets d whose appearance is similar in the present frame1And d2And tracking target t1And t2Two global association results.
For FIG. 4(a), RP1={rp2}={11},RP2={rp 100, for the association of fig. 4(b),
Figure BDA00025232052900002410
C rpb0, for the association of figure 4(c),
Figure BDA0002523205290000251
Crpc4, taking the global association with the minimum azimuth matching cost as the final target matching, and the matching result is d1→t1,d2→t2
(3) Motion prediction and apparent feature combined recovery of missed detection target
For satisfaction of formula in the correlation matrix (19)
Figure BDA0002523205290000252
Is not matched with the tracking target tjAnd the target is possibly missed or blocked, and the method combines motion prediction and apparent characteristics to recover the missed target. Step (1.2) has calculated the target tj(j ═ 1,2, …, n) at the predicted position in this frame
Figure BDA0002523205290000253
And further recovering the missed detection target by combining the apparent characteristics.
(3.1) with target tjPredicted position of
Figure BDA0002523205290000254
Is the center point, as shown in equation (20)
Figure BDA0002523205290000255
Is wide, and is shown in formula (21)
Figure BDA0002523205290000256
Candidate boxes are determined to be high.
And (3.2) cutting image slices in the candidate frame of the frame image, inputting the image slices into an HOG feature extractor to obtain HOG features of the target in the candidate frame, performing point multiplication operation on the HOG features and the HOG features which are not matched with the tracked target to obtain a feature response graph, wherein the maximum pixel value in the feature response graph represents the HOG feature similarity of the detected target and the tracked target, and comparing the maximum pixel value with the HOG feature similarity threshold to determine the target state.
In the method, if the HOG feature similarity is greater than the HOG feature similarity threshold in step (3.2), the coordinate position of the maximum pixel value in the feature response map is mapped to the coordinate position in the original image
Figure BDA0002523205290000257
By the coordinate position
Figure BDA0002523205290000258
And size wj+Δwj,hj+ΔhjUpdating the tracking target tjSimultaneously updating the apparent feature a of the target1And a2
In the method, if the HOG feature similarity is smaller than the threshold in step (3.2), the tracking target t is considered to be jIs blocked or disappears, and does not update the target tj
(4) Maintaining a tracking chain, updating the orientation constraint relation between targets, and outputting a tracking result
For the tracking target which is successfully matched and recovered, the target update is completed through the steps. And further processing the tracking chain of the unmatched tracking target and the unmatched detection target, updating the orientation constraint relation between the targets to prepare for the next frame of tracking, and outputting the target tracking result of the frame.
(4.1) maintaining a tracking chain, and enabling the tracking target t lower than the threshold value in the step (3.2)iAnd marking the target to be confirmed, adding the target to the incidence matrix of the next frame, updating the target if the target is successfully matched in the next frame, otherwise, considering that the target is shielded, increasing the continuous unmatched times, considering that the target disappears and removing the target from the incidence matrix if the continuous unmatched times reach the continuous unmatched threshold.
(4.2) maintaining the tracking chain, which will satisfy equation (22)
Figure BDA0002523205290000261
Undetected target diAnd marking as a target to be initialized, adding the target to the incidence matrix of the next frame, initializing the target to be a new target if the target is successfully matched in the next frame, otherwise, considering the target as a false alarm target, and removing the target from the incidence matrix.
And (4.3) updating the orientation constraint relation between the targets and outputting the tracking result of the current frame.
Therefore, the method for tracking the multi-vehicle and multi-pedestrian targets in the video adopts a multi-feature apparent modeling method based on the target category, fully considers the target and background changes, constructs different feature extraction operators by using the detected target category to complete accurate target apparent description, improves the description capability of the features on the target, realizes accurate description on the target, and overcomes the technical problem that the single feature is difficult to deal with the complicated target and background changes; the proposed hierarchical progressive feature extraction algorithm establishes an incidence matrix according to the target category, completes the rapid dimensionality reduction of the incidence matrix by using a high-threshold matching algorithm, reduces the depth feature extraction times, improves the algorithm time efficiency, effectively solves the technical problem of low timeliness of a tracking algorithm caused by time consumption of the depth feature extraction, and improves the tracking speed while ensuring the tracking precision; the method for completing the matching of the targets by utilizing the relative orientation constraint relation between the targets and assisting the apparent features to complete the correct matching of the apparent similar targets by encoding the relative orientation of the targets of the previous frame and assisting the apparent features of the current frame to complete the correct matching of the apparent similar targets, thereby realizing the effective distinguishing of the partial targets with similar appearances and effectively reducing the mismatching rate; the recovery of the missed detection target is completed by utilizing the target apparent characteristics and the motion prediction under the tracking framework of the correlation filtering based on the correlation filter, so that the recovery rate of the missed detection target is greatly improved. The method has the advantages of high accuracy, strong adaptability and high detection efficiency, reduces the calculation complexity, reduces the occupation of the calculation resources of the system, and has strong system reliability. And robustness on illumination change, scene change, noise influence and the like in the imaging process is strong. The method can be effectively applied to the tracking of multiple vehicles and multiple pedestrian targets which have complex background change, variable target postures, frequent shielding among targets and difficult distinguishing of appearance similarity in the automatic driving process.
So far, the technical solutions of the present invention have been described with reference to the preferred embodiments shown in the drawings, but it should be understood by those skilled in the art that the above embodiments are only for clearly illustrating the present invention, and not for limiting the scope of the present invention, and it is apparent that the scope of the present invention is not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (10)

1. A method for tracking multiple vehicles and multiple pedestrian targets in a video is characterized by comprising the following steps:
step 1), data acquisition: acquiring video data;
step 2), target matching: aiming at the video data, establishing an incidence matrix of a detection result and a tracking result, extracting apparent characteristics based on target categories from the detection result, and performing target matching in a layered and progressive manner;
step 3), auxiliary matching: further matching unsuccessfully matched apparent similar targets in target matching by using the orientation constraint relation between the targets;
step 4), recovering the missed detection target: recovering undetected targets of the frame by utilizing motion prediction and apparent characteristics;
Step 5), outputting a tracking result: and maintaining a tracking chain, updating the orientation constraint relation between the targets, and outputting the tracking result of the current frame.
2. The method of claim 1, wherein the capturing video data comprises capturing video data in real time.
3. The method of claim 1, wherein the obtaining video data comprises reading the video data from a file.
4. The method of claim 1 or 2, wherein the acquiring video data comprises capturing video data with a camera mounted on an autonomous vehicle.
5. The method of claim 1, wherein the tracking of multiple vehicles and multiple human targets in the video,
the target matching further comprises:
step 1.1), respectively establishing a vehicle target incidence matrix and a pedestrian target incidence matrix according to the target types.
6. The method for tracking multiple vehicles and multiple human targets in video according to claim 5,
the step 1.1) of respectively establishing a vehicle target incidence matrix and a pedestrian target incidence matrix according to the target types comprises the following steps:
The current frame shown in formula (1) is used to detect the target sequence set Di
Di={d1,d2,…di…,dM-1,dM} (1)
For a row (or a column), tracking a target sequence set T by a previous frame shown in a formula (2)j
Tj={t1,t2,…tj…,tN-1,tN} (2)
An M × N correlation matrix is established for the columns (or rows), M, N being a natural number, each element A in the correlation matrixijRepresents the detection target d shown in the formula (3)i
di={typei,xi,yi,wi,hi}(i=1,2,…,M) (3)
And the tracking target t shown in the formula (4)j
Figure FDA0002523205280000021
Correlation result of (1) (initialization A)ij=1,Aij1 stands for diAnd tjAssociated, otherwise, representing unassociated), wherein typeiIs diClass, { xi,yiIs diCoordinates of center point of target frame, { wi,hiIs diWidth and height of the target frame, i.e. the size, id, of the target framejIs tjID of (2) { x }j,yjIs tjCoordinates of center point of target frame, { wj,hjIs tjThe width and the height of the target frame,
Figure FDA0002523205280000022
is tjSpeed of movement of { Δ w }j,ΔhjIs tjWide and high variation of (a).
7. The method for tracking multiple vehicles and multiple human targets in video according to claim 1 or 6,
it is characterized in that the preparation method is characterized in that,
the target matching further comprises:
and 1.2) obtaining the predicted position of each target in the current frame in the previous frame by using Kalman filtering motion prediction.
8. The method for tracking multiple vehicles and multiple human targets in video according to claim 6 or 7,
the method for obtaining the predicted position of each target in the previous frame in the current frame by using Kalman filtering motion prediction comprises the following steps:
Obtaining each tracking target t in the previous frame by Kalman filtering motion predictionj(j-1, 2, …, N) at the predicted position of the current frame
Figure FDA0002523205280000023
Is established with
Figure FDA0002523205280000024
As the center of circle, R is shown in formula (5)
Figure FDA0002523205280000031
For a circular correlation gate of radius, coordinate { x ] of center point of target framei,yiM (M is 1,2, …, M) detection targets d falling within the correlation gate1,…,dmAssociated to the tracking target tj(j-1, 2, …, N), i.e. for the j-th column, a is heldmjThe matrix is thinned by setting the remaining number to 0 (1, 2, …, M).
9. The method of claim 8, wherein the tracking of multiple vehicles and multiple human targets in the video,
the target matching further comprises:
step 1.3), calculating to satisfy AijTarget association pair A not equal to 0ijCorresponding detection target diAnd tracking the target tjTarget frame similarity FsDegree of overlap with target frame FiouObtaining the overall similarity F of the target frame shown in the formula (6)box
Fbox=λboxFs+(1–λbox)Fiou, (6)
Wherein λ isboxAs the target frame similarity FsGlobal similarity in object box FboxThe weight in (1);
updating each correlation pair in the correlation matrix as shown in equation (7)
Figure FDA0002523205280000032
TboxIs the overall similarity threshold of the target frame; for satisfying AijAssociation pair A of 1ijStatistics of all A in ith row and jth columnijNumber of 1
Figure FDA0002523205280000033
If it is
Figure FDA0002523205280000034
Indicating the detection target diAnd tracking the target t jHaving been successfully matched, using the detection target diTarget frame center point coordinates { x }i,yiAnd size wi,hiUpdating the tracking target tjCalculating and saving the target tjThe ith row and the jth column of the correlation matrix are deleted.
10. An apparatus for tracking multiple vehicles and multiple human targets in a video, comprising:
a data acquisition device: for acquiring video data;
a target matching device: aiming at the video data, establishing an incidence matrix of a detection result and a tracking result, extracting apparent characteristics based on target categories from the detection result, and performing target matching in a layered and progressive manner;
auxiliary matching device: further matching unsuccessfully matched apparent similar targets in target matching by using the orientation constraint relation between the targets;
missed detection target recovery means: recovering undetected targets of the frame by utilizing motion prediction and apparent characteristics;
a tracking result output device: and maintaining a tracking chain, updating the orientation constraint relation between the targets, and outputting the tracking result of the current frame.
CN202010496840.7A 2020-06-03 2020-06-03 Tracking method for multiple vehicles and multiple lines of human targets in video Active CN111862147B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010496840.7A CN111862147B (en) 2020-06-03 2020-06-03 Tracking method for multiple vehicles and multiple lines of human targets in video

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010496840.7A CN111862147B (en) 2020-06-03 2020-06-03 Tracking method for multiple vehicles and multiple lines of human targets in video

Publications (2)

Publication Number Publication Date
CN111862147A true CN111862147A (en) 2020-10-30
CN111862147B CN111862147B (en) 2024-01-23

Family

ID=72984949

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010496840.7A Active CN111862147B (en) 2020-06-03 2020-06-03 Tracking method for multiple vehicles and multiple lines of human targets in video

Country Status (1)

Country Link
CN (1) CN111862147B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112509011A (en) * 2021-02-08 2021-03-16 广州市玄武无线科技股份有限公司 Static commodity statistical method, terminal equipment and storage medium thereof
CN113792634A (en) * 2021-09-07 2021-12-14 北京易航远智科技有限公司 Target similarity score calculation method and system based on vehicle-mounted camera

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160132728A1 (en) * 2014-11-12 2016-05-12 Nec Laboratories America, Inc. Near Online Multi-Target Tracking with Aggregated Local Flow Descriptor (ALFD)
CN105894022A (en) * 2016-03-30 2016-08-24 南京邮电大学 Adaptive hierarchical association multi-target tracking method
CN106296742A (en) * 2016-08-19 2017-01-04 华侨大学 A kind of online method for tracking target of combination Feature Points Matching
CN108447080A (en) * 2018-03-02 2018-08-24 哈尔滨工业大学深圳研究生院 Method for tracking target, system and storage medium based on individual-layer data association and convolutional neural networks
CN109859238A (en) * 2019-03-14 2019-06-07 郑州大学 One kind being based on the optimal associated online multi-object tracking method of multiple features
CN110728702A (en) * 2019-08-30 2020-01-24 深圳大学 High-speed cross-camera single-target tracking method and system based on deep learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160132728A1 (en) * 2014-11-12 2016-05-12 Nec Laboratories America, Inc. Near Online Multi-Target Tracking with Aggregated Local Flow Descriptor (ALFD)
CN105894022A (en) * 2016-03-30 2016-08-24 南京邮电大学 Adaptive hierarchical association multi-target tracking method
CN106296742A (en) * 2016-08-19 2017-01-04 华侨大学 A kind of online method for tracking target of combination Feature Points Matching
CN108447080A (en) * 2018-03-02 2018-08-24 哈尔滨工业大学深圳研究生院 Method for tracking target, system and storage medium based on individual-layer data association and convolutional neural networks
CN109859238A (en) * 2019-03-14 2019-06-07 郑州大学 One kind being based on the optimal associated online multi-object tracking method of multiple features
CN110728702A (en) * 2019-08-30 2020-01-24 深圳大学 High-speed cross-camera single-target tracking method and system based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
方岚;于凤芹;: "自适应在线判别外观学习的分层关联多目标跟踪" *
梅立雪;汪兆栋;张浦哲;: "一种邻帧匹配与卡尔曼滤波相结合的多目标跟踪算法" *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112509011A (en) * 2021-02-08 2021-03-16 广州市玄武无线科技股份有限公司 Static commodity statistical method, terminal equipment and storage medium thereof
CN112509011B (en) * 2021-02-08 2021-05-25 广州市玄武无线科技股份有限公司 Static commodity statistical method, terminal equipment and storage medium thereof
CN113792634A (en) * 2021-09-07 2021-12-14 北京易航远智科技有限公司 Target similarity score calculation method and system based on vehicle-mounted camera

Also Published As

Publication number Publication date
CN111862147B (en) 2024-01-23

Similar Documents

Publication Publication Date Title
CN108304873B (en) Target detection method and system based on high-resolution optical satellite remote sensing image
Gurghian et al. Deeplanes: End-to-end lane position estimation using deep neural networksa
CN109800689B (en) Target tracking method based on space-time feature fusion learning
Keller et al. The benefits of dense stereo for pedestrian detection
CN113674328B (en) Multi-target vehicle tracking method
Gonzalez et al. Lane detection using histogram-based segmentation and decision trees
CN108090435B (en) Parking available area identification method, system and medium
CN109146911B (en) Target tracking method and device
CN111767847B (en) Pedestrian multi-target tracking method integrating target detection and association
CN115995063A (en) Work vehicle detection and tracking method and system
CN110427797B (en) Three-dimensional vehicle detection method based on geometric condition limitation
CN108564598B (en) Improved online Boosting target tracking method
US9008364B2 (en) Method for detecting a target in stereoscopic images by learning and statistical classification on the basis of a probability law
CN115240130A (en) Pedestrian multi-target tracking method and device and computer readable storage medium
CN111461221A (en) Multi-source sensor fusion target detection method and system for automatic driving
CN111435421A (en) Traffic target-oriented vehicle weight identification method and device
CN110543817A (en) Pedestrian re-identification method based on posture guidance feature learning
Getahun et al. A deep learning approach for lane detection
CN111862147A (en) Method for tracking multiple vehicles and multiple human targets in video
Zhu et al. A review of 6d object pose estimation
Spinello et al. Multimodal People Detection and Tracking in Crowded Scenes.
CN116563376A (en) LIDAR-IMU tight coupling semantic SLAM method based on deep learning and related device
Shao et al. Faster R-CNN learning-based semantic filter for geometry estimation and its application in vSLAM systems
CN113177439A (en) Method for detecting pedestrian crossing road guardrail
CN116912763A (en) Multi-pedestrian re-recognition method integrating gait face modes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant