CN116452631A

CN116452631A - Multi-target tracking method, terminal equipment and storage medium

Info

Publication number: CN116452631A
Application number: CN202310306107.8A
Authority: CN
Inventors: 陈龙涛; 廖国兴; 曾焕强; 朱建清; 黄德天; 傅玉青; 施一帆
Original assignee: Huaqiao University
Current assignee: Huaqiao University
Priority date: 2023-03-27
Filing date: 2023-03-27
Publication date: 2023-07-18

Abstract

The invention relates to a multi-target tracking method, terminal equipment and a storage medium, wherein the method comprises the following steps: reading video information; performing target segmentation on a frame image in the video information to obtain pixel level information and apparent characteristic information of a target; predicting the track in each frame of image based on a Kalman filtering algorithm; based on the track in the previous frame image and the apparent characteristic information of the target in the current frame image, calculating the appearance similarity between the track and the target, extracting the track and the target with the appearance similarity larger than a similarity threshold as a pre-matching track and a pre-matching target, and storing the pre-matching target into a matching target set; calculating a cost matrix between the pre-matching track and the pre-matching target, calculating Mask-IoU scores between the non-matching track and the non-matching target, and fusing the two to obtain a final cost matrix; and obtaining a track matching result through a Hungary algorithm. Compared with the existing method, the method has the advantages of efficiency and performance.

Description

Multi-target tracking method, terminal equipment and storage medium

Technical Field

The present invention relates to the field of video signal processing, and in particular, to a multi-target tracking method, a terminal device, and a storage medium.

Background

Existing mainstream real-time tracking methods generally use a deep learning-based build tracking method to improve correlation accuracy. However, in some cases, tracking methods built based on deep learning often do not have high tracking accuracy and fast operation, and are compatible with low cost, and the high complexity of these methods results in often unbalanced performance and efficiency.

Disclosure of Invention

In order to solve the problems, the invention provides a multi-target tracking method, a terminal device and a storage medium.

The specific scheme is as follows:

a multi-target tracking method comprising the steps of:

s1: reading video information to be subjected to target tracking;

s2: performing target segmentation on a frame image in the video information to obtain pixel level information of a target; obtaining apparent characteristic information of the target through a re-recognition model based on pixel level information of the target;

s3: predicting the track in each frame of image based on a Kalman filtering algorithm;

s4: based on the track in the previous frame image and the apparent characteristic information of the target in the current frame image, calculating the appearance similarity between the track and the target, extracting the track and the target with the appearance similarity greater than a similarity threshold as a pre-matching track and a pre-matching target, storing the pre-matching target into a matching target set, taking other targets except the pre-matching target in the current frame image as unmatched targets, and taking other tracks except the pre-matching track in the previous frame image as unmatched tracks;

s5: calculating Mask-IoU scores between the pre-matching track and the pre-matching target through pixel-level information of the pre-matching track and the pre-matching target; calculating the appearance similarity between the pre-matching track and the pre-matching target through the apparent characteristic information of the pre-matching track and the pre-matching target; taking a weighted summation result of the Mask-IoU score and the appearance similarity as a cost matrix of the pre-matching track and the pre-matching target; calculating Mask-IoU scores between the unmatched tracks and the unmatched objects through pixel-level information of the unmatched tracks and the unmatched objects; fusing the cost matrix of the pre-matching track and the pre-matching target with the Mask-IoU score between the non-matching track and the non-matching target to obtain a final cost matrix;

s6: and obtaining a matching result of each target and the track in the previous frame image through a Hungary algorithm based on the final cost matrix and the track set of the previous frame image.

Further, the specific process of step S4 includes:

extracting all tracks with track survival time smaller than the maximum track survival time from tracks of a previous frame of image to form a survival track set;

adding the target in the current frame image to an unmatched target set;

pairing tracks in the survival track set with the targets in the unmatched target set pairwise to form candidate pairs;

and calculating the appearance similarity between the tracks and the targets in each candidate pair, and extracting the tracks and the targets in the candidate pair with the appearance similarity larger than a similarity threshold as a pre-matching track and a pre-matching target.

Further, the track survival time is initialized to 0, the track survival time of the track which is judged to be the pre-matching track is set to 0 in each frame, and the track survival time of other tracks is added with 1.

Further, the calculation formula of the cost matrix N is:

N＝λC+(1-λ)I

wherein λ represents the weight, C represents the appearance similarity matrix, and I represents the Mask-IoU score matrix.

Further, the track of the target is matched to the track of the previous frame and the track of n continuous frames before the previous frame as the confirmation track of the previous frame, other tracks are used as the non-confirmation tracks of the previous frame, the track in the previous frame image in the step S4 is the confirmation track of the previous frame, and the track set of the previous frame image in the step S6 is the set formed by all tracks in the previous frame image.

A multi-target tracking terminal device comprising a processor, a memory and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method according to the embodiments of the invention when the computer program is executed.

A computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the method described above for embodiments of the present invention.

Compared with the existing method, the method has the advantages of being capable of combining efficiency and performance, and capable of meeting the actual application demands of low cost and instant at present.

Drawings

Fig. 1 is a flowchart of a first embodiment of the present invention.

Detailed Description

For further illustration of the various embodiments, the invention is provided with the accompanying drawings. The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate embodiments and together with the description, serve to explain the principles of the embodiments. With reference to these matters, one of ordinary skill in the art will understand other possible embodiments and advantages of the present invention.

The invention will now be further described with reference to the drawings and detailed description.

Embodiment one:

the embodiment of the invention provides a multi-target tracking method, as shown in fig. 1, comprising the following steps:

s1: and reading video information to be subjected to target tracking.

The reading of the video information is performed in this embodiment via the open-cv interface.

S2: performing target segmentation on a frame image in the video information to obtain pixel level information of a target; and obtaining apparent characteristic information of the target through the re-identification model based on the pixel level information of the target.

The target segmentation model and the re-recognition model are all existing models, and are not limited herein.

S3: and predicting the track in each frame of image based on a Kalman filtering algorithm.

S4: based on the track in the previous frame image and the apparent characteristic information of the target in the current frame image, calculating the appearance similarity between the track and the target, extracting the track and the target with the appearance similarity greater than a similarity threshold as a pre-matching track and a pre-matching target, storing the pre-matching target into a matching target set, taking other targets except the pre-matching target in the current frame image as unmatched targets, and taking other tracks except the pre-matching track in the previous frame image as unmatched tracks.

Step S4 belongs to a pre-matching phase, the implementation of which comprises the following steps:

s401: extracting all tracks with track survival time smaller than the maximum track survival time from tracks of the previous frame of image, and sequencing the tracks according to the sequence from small track survival time to large track survival time to form a survival track set;

s402: adding the target in the current frame image to an unmatched target set;

s403: pairing tracks in the survival track set with the targets in the unmatched target set pairwise to form candidate pairs;

s404: and calculating the appearance similarity between the tracks and the targets in each candidate pair, and extracting the tracks and the targets in the candidate pair with the appearance similarity larger than a similarity threshold as a pre-matching track and a pre-matching target.

In the appearance similarity calculation, the embodiment adopts a cosine distance algorithm, and a similarity threshold and a maximum track survival time are set by a person skilled in the art according to requirements.

Each pre-matching target only has one pre-matching track, and if the pre-matching target has two or more candidate pairs with appearance similarity larger than the similarity threshold, the track in the candidate pair with the largest appearance similarity is selected as the matching track of the pre-matching target.

In the setting of the track survival time, the initialization is set to 0, the track survival time of the track which is judged to be the pre-match track is set to 0 in each frame, and the track survival time of other tracks is added with 1.

S5: calculating Mask-IoU scores between the pre-matching track and the pre-matching target through pixel-level information of the pre-matching track and the pre-matching target; calculating the appearance similarity between the pre-matching track and the pre-matching target through the apparent characteristic information of the pre-matching track and the pre-matching target; taking a weighted summation result of the Mask-IoU score and the appearance similarity as a cost matrix of the pre-matching track and the pre-matching target; calculating Mask-IoU scores between the unmatched tracks and the unmatched objects through pixel-level information of the unmatched tracks and the unmatched objects; and fusing the cost matrix of the pre-matching track and the pre-matching target with the Mask-IoU score between the non-matching track and the non-matching target to obtain a final cost matrix.

The calculation formula of the cost matrix N in this embodiment is:

N＝λC+(1-λ)I

The calculation formula of the final cost matrix F is as follows:

where H represents a similarity threshold.

In order to further improve the matching efficiency, in this embodiment, the track of the target is matched to the track of the previous frame and n consecutive frames before the previous frame, where the track in the previous frame image in step S4 is the confirmation track of the previous frame, and the other tracks are the non-confirmation tracks of the previous frame, and the track set of the previous frame image in step S6 is the set formed by all the tracks in the previous frame image. By the method, matching efficiency is guaranteed, and omission of tracks when the targets corresponding to the unconfirmed tracks reappear in the video is avoided.

n is an integer greater than 2, in this embodiment n is set to 3.

The embodiment of the invention provides a multi-target tracking method capable of combining efficiency and performance, which uses a strategy of firstly detecting segmentation frame by frame and then associating data, utilizes traditional apparent and motion information to match, and integrates pre-matching information and a quick association algorithm for selecting weighting. The embodiment provides a simple and flexible tracking method capable of achieving good tracking effect, and can meet the requirements of low-cost and instant practical application.

Embodiment two:

the invention also provides a multi-target tracking terminal device, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the steps in the method embodiment of the first embodiment of the invention are realized when the processor executes the computer program.

Further, as an executable scheme, the multi-target tracking terminal device may be a computing device such as a desktop computer, a notebook computer, a palm computer, and a cloud server. The multi-target tracking terminal device may include, but is not limited to, a processor, a memory. It will be appreciated by those skilled in the art that the above-described constituent structure of the multi-target tracking terminal device is merely an example of the multi-target tracking terminal device, and does not constitute limitation of the multi-target tracking terminal device, and may include more or less components than those described above, or may combine some components, or different components, for example, the multi-target tracking terminal device may further include an input/output device, a network access device, a bus, etc., which is not limited by the embodiment of the present invention.

Further, as an implementation, the processor may be a central processing unit (Central Processing Unit, CPU), other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, which is a control center of the multi-target tracking terminal device, and connects various parts of the entire multi-target tracking terminal device using various interfaces and lines.

The memory may be used to store the computer program and/or module, and the processor may implement various functions of the multi-target tracking terminal device by running or executing the computer program and/or module stored in the memory and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, at least one application program required for a function; the storage data area may store data created according to the use of the cellular phone, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.

The present invention also provides a computer readable storage medium storing a computer program which when executed by a processor implements the steps of the above-described method of an embodiment of the present invention.

The modules/units integrated in the multi-target tracking terminal device may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a software distribution medium, and so forth.

While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A multi-target tracking method, comprising the steps of:

s1: reading video information to be subjected to target tracking;

2. The multi-target tracking method of claim 1, wherein: the specific process of step S4 comprises the following steps:

adding the target in the current frame image to an unmatched target set;

3. The multi-target tracking method of claim 2, wherein: the track survival time is initialized to 0, the track survival time of the track which is judged to be the pre-matched track is set to 0 in each frame, and the track survival time of other tracks is added with 1.

4. The multi-target tracking method of claim 1, wherein: the calculation formula of the cost matrix N is as follows:

N＝λC+(1-)I

5. The multi-target tracking method of claim 1, wherein: and (3) taking the tracks of the previous frame and the continuous n frames before the previous frame as the confirmation tracks of the previous frame, taking other tracks as the unacknowledged tracks of the previous frame, wherein the tracks in the previous frame image in the step S4 are the confirmation tracks of the previous frame, and the track set of the previous frame image in the step S6 is a set formed by all tracks in the previous frame image.

6. A multi-target tracking terminal device, characterized by: comprising a processor, a memory and a computer program stored in the memory and running on the processor, which processor, when executing the computer program, carries out the steps of the method according to any one of claims 1 to 5.

7. A computer-readable storage medium storing a computer program, characterized in that: the computer program implementing the steps of the method according to any one of claims 1 to 5 when executed by a processor.